EditSum: A Retrieve-and-Edit Framework for Source Code Summarization (ASE 2021 - Research Papers)

Who

Jia Allen Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, Zhi Jin

Track

ASE 2021 Research Papers

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Nov 2021 19:00 - 19:20 at Kangaroo - Code Chair(s): Michael Pradel

Abstract

Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often mismatched, missing or outdated in software projects. Code summarization aims to generate brief and accurate natural language descriptions automatically for source code. According to Gros et al., code summaries are highly structured and have many repetitive patterns, for example, they are often begin with patterns like “return true if…” and “create a new…”. The promising results obtained by previous approaches also prove the existence of these patternized words. Besides the patternized words, a code summary also contains important keywords, which are the key to reflecting the functionality of the code. However, the state-of-the-art code summarization approaches perform poorly on predicting the keywords, which leads to the generated summaries suffer a loss in informativeness. To alleviate this problem, this paper proposes a novel retrieve-and-edit approach named EditSum for code summarization. Specifically, EditSum ﬁrst retrieves a similar code snippet from a pre-deﬁned corpus and treats its summary as a prototype summary to learn the pattern. Then, EditSum edits the prototype automatically to combine the pattern in the prototype with the semantic information of input code. Our motivation is that the retrieved prototype provides a good start-point for post-generation because the summaries of similar code snippets often have the same pattern. The post-editing process further reuses the patternized words in prototype and generates keywords based on the semantic information of code. We conduct experiments on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that EditSum outperforms the state-of-the-art approaches by a substantial margin. The human evaluation also proves the summaries generated by EditSum are more informative and useful. We also verify that EditSum performs well on predicting the patternized words and keywords. The code and data will be open-sourced.

Jia Allen Li

Peking University

Yongmin Li

Peking University

Ge Li

Peking University

Xing Hu

Zhejiang University

Xin Xia

Huawei Software Engineering Application Technology Lab

China

Zhi Jin

Peking University

China

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Nov
Displayed time zone: Hobart

19:00 - 20:00	CodeTool Demonstrations / Research Papers / NIER track at Kangaroo Chair(s): Michael Pradel University of Stuttgart

19:00 20m Talk		EditSum: A Retrieve-and-Edit Framework for Source Code Summarization Research Papers Jia Allen Li Peking University, Yongmin Li Peking University, Ge Li Peking University, Xing Hu Zhejiang University, Xin Xia Huawei Software Engineering Application Technology Lab, Zhi Jin Peking University
19:20 20m Talk		Interactive Cross-language Code Retrieval with Auto-Encoders Research Papers Binger Chen Technische Universität Berlin, Ziawasch Abedjan Leibniz Universität Hannover
19:40 10m Talk		Did You Do Your Homework? Raising Awareness on Software Fairness and Discrimination NIER track Max Hort University College London, Federica Sarro University College London
19:50 5m Talk		Quito: a Coverage-Guided Test Generator for Quantum Programs Tool Demonstrations Xinyi Wang Nanjing University of Aeronautics and Astronautics, Nanjing, China, Paolo Arcaini National Institute of Informatics , Tao Yue Nanjing University of Aeronautics and Astronautics, Shaukat Ali Simula Research Laboratory, Norway
19:55 5m Talk		Revizor: A Data-Driven Approach to Automate Frequent Code Changes Based on Graph Matching Tool Demonstrations Oleg Smirnov JetBrains Research, Saint Petersburg State University, Artyom Lobanov JetBrains Research, Yaroslav Golubev JetBrains Research, Elena Tikhomirova JetBrains Research, Timofey Bryksin JetBrains Research; HSE University Pre-print

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization