Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study (ASE 2019 Research Papers)

Blogs (1) >>

Sun 10 - Fri 15 November 2019 San Diego, California, United States

Who

James C. Davis, Daniel Moyer, Ayaan M. Kazerouni, Dongyoon Lee

Track

ASE 2019 Research Papers

When

Wed 13 Nov 2019 11:00 - 11:20 at Cortez 1 - Testing and Program Analysis Chair(s): Jun Sun

Abstract

The regular expression (regex) practices of software engineers affect the maintainability, correctness, and security of their software applications. Empirical research has described characteristics like the distribution of regex feature usage, the structural complexity of regexes, and worst-case regex match behaviors. But researchers have not critically examined the methodology they follow to extract regexes, and findings to date are typically generalized from regexes written in only 1– 2 programming languages. This is an incomplete foundation.

Generalizing existing research depends on validating two hypotheses: (1) Various regex extraction methodologies yield similar results, and (2) Regex characteristics are similar across programming languages. To test these hypotheses, we defined eight regex metrics to capture the dimensions of regex representation, string language diversity, and worst-case match complexity. We report that the two competing regex extraction methodologies yield comparable corpuses, suggesting that simpler regex extraction techniques will still yield sound corpuses. But in comparing regexes across programming languages, we found significant differences in some characteristics by programming language. Our findings have bearing on future empirical methodology, as the programming language should be considered, and generalizability will not be assured. Our measurements on a corpus of 537,806 regexes can guide data-driven designs of a new generation of regex tools and regex engines.

Link to Preprint

http://people.cs.vt.edu/davisjam/downloads/publications/DavisMoyerKazerouniLee-RegexGeneralizability-ASE19.pdf

File attachments

J. Davis's slides for "Testing Regex Generalizability and its Implications" (DavisMoyerKazerouniLee-RegexGeneralizability-ASE19-slides.pptx)	6.11MiB

James C. Davis

Virginia Tech, USA

United States

Daniel Moyer

Virginia Tech

Ayaan M. Kazerouni

Virginia Tech

United States

Dongyoon Lee

Stony Brook University

United States

Session Program

Wed 13 Nov

10:40 - 12:20: Papers - Testing and Program Analysis at Cortez 1
Chair(s): Jun SunSingapore Management University, Singapore

10:40 - 11:00
Talk

Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular ExpressionsACM SIGSOFT Distinguished Paper Award

Louis G. Michael IVVirginia Tech, James DonohueUniversity of Bradford, James C. DavisVirginia Tech, USA, Dongyoon LeeStony Brook University, Francisco ServantVirginia Tech

Pre-print File Attached

11:00 - 11:20
Talk

Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study

James C. DavisVirginia Tech, USA, Daniel MoyerVirginia Tech, Ayaan M. KazerouniVirginia Tech, Dongyoon LeeStony Brook University

Pre-print File Attached

11:20 - 11:40
Talk

Accurate String Constraints Solution Counting with Weighted Automata

Elena ShermanBoise State University, Andrew HarrisBoise State University

11:40 - 12:00
Talk

Subformula Caching for Model Counting and Quantitative Program Analysis

William EiersUniversity of California at Santa Barbara, USA, Seemanta SahaUniversity of California Santa Barbara, Tegan BrennanUniversity of California, Santa Barbara, Tevfik BultanUniversity of California, Santa Barbara

12:00 - 12:10
Demonstration

SPrinter: A Static Checker for Finding Smart Pointer Errors in C++ Programs

Xutong MaInstitute of Software, Chinese Academy of Sciences, Jiwei YanInstitute of Software, Chinese Academy of Sciences, Yaqi LiInstitute of Software, Chinese Academy of Sciences, Jun YanInstitute of Software, Chinese Academy of Sciences, Jian ZhangInstitute of Software, Chinese Academy of Sciences

12:10 - 12:20
Demonstration

FPChecker: Detecting Floating-Point Exceptions in GPU Applications

Ignacio LagunaLawrence Livermore National Laboratory