Automated Patch Correctness Assessment: How Far are We? (ASE 2020 - Research Papers)

Who

Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, Hai Jin

Track

ASE 2020 Research Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Sep 2020 08:20 - 08:40 at Kangaroo - Bugs and Automated Repair Chair(s): Jifeng Xuan

Abstract

Test-based automated program repair (APR) has attracted huge attention from both industry and academia. Despite the significant progress made in recent studies, the overfitting problem (i.e., the generated patch is plausible but overfitting) is still a major and long-standing challenge. Therefore, plenty of automated techniques have been proposed to assess the correctness of patches either in the patch generation phase or in the evaluation of APR techniques. However, the effectiveness of the existing techniques has not been systematically compared and little is known to their advantages and disadvantages. To fill this gap, we performed a large-scale empirical study in this paper. Specifically, we systematically investigated the effectiveness of existing automated patch correctness assessment techniques, including both static and dynamic ones, based on 902 patches automatically generated by 21 APR tools from 4 different categories (the largest benchmark ever in the literature). Our empirical study revealed the following major findings: (1) static code features with respect to patch syntax and semantics are generally effective in differentiating overfitting patches over correct ones; (2) dynamic techniques can generally achieve high precision while heuristics based on static code features are more effective towards recall; (3) existing techniques are more effective towards certain projects and certain types of APR techniques while less effective to the others; (4) existing techniques are highly complementary to each other. A single technique can only detect at most 53.5% overfitting patches while 93.3% of the overfitting ones can be detected by at least one technique. Based on our findings, we designed an integration strategy to first integrate static code features via learning, and then combine with others by the majorrity voting strategy. Our experiments show that the strategy can enhance the performance of existing patch correctness assessment techniques significantly.

Link to Preprint

https://shangwenwang.github.io/files/ASE-20.pdf

DOI

https://doi.org/10.1145/3324884.3416590

Shangwen Wang

National University of Defense Technology

China

Ming Wen

Huazhong University of Science and Technology, China

China

Bo Lin

National University of Defense Technology

Hongjun Wu

National University of Defense Technology

Yihao Qin

National University of Defense Technology

Deqing Zou

Huazhong University of Science and Technology

Xiaoguang Mao

National University of Defense Technology

Hai Jin

Huazhong University of Science and Technology

Link to Dataset

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 24 Sep
Times are displayed in time zone: (UTC) Coordinated Universal Time

	08:00 - 09:00: Bugs and Automated RepairResearch Papers at Kangaroo Chair(s): Jifeng XuanWuhan University

	08:00 - 08:20 Talk		No Strings Attached: An Empirical Study of String-related Software Bugs Research Papers Aryaz EghbaliUniversity of Stuttgart, Michael PradelUniversity of Stuttgart, Germany Pre-print File Attached
	08:20 - 08:40 Research paper		Automated Patch Correctness Assessment: How Far are We? Research Papers Shangwen WangNational University of Defense Technology, Ming WenHuazhong University of Science and Technology, China, Bo LinNational University of Defense Technology, Hongjun WuNational University of Defense Technology, Yihao QinNational University of Defense Technology, Deqing ZouHuazhong University of Science and Technology, Xiaoguang MaoNational University of Defense Technology, Hai JinHuazhong University of Science and Technology DOI Pre-print Media Attached
	08:40 - 09:00 Research paper		Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair Research Papers Haoye TianUniversity of Luxembourg, Kui LiuUniversity of Luxembourg, Luxembourg, Abdoul Kader KaboréUniversity of Luxembourg, Anil KoyuncuUniversity of Luxembourg, Luxembourg, Li LiMonash University, Australia, Jacques KleinUniversity of Luxembourg, Luxembourg, Tegawendé F. BissyandéUniversity of Luxembourg, Luxembourg