
Registered user since Wed 14 Apr 2021
Contributions
View general profile
Registered user since Wed 14 Apr 2021
Contributions
Research Papers
Thu 13 Oct 2022 10:30 - 10:50 at Banquet A - Technical Session 22 - Code Summarization and Recommendation Chair(s): Houari SahraouiPrior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new approach is introduced; this is time-consuming, slows down the development of the new approach, and results from different user studies may not be comparable to each other. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites.
To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for technical queries for SQA sites. It contains 111 query-summary pairs extracted from 382 Stack Overflow answers with 2,014 sentence candidates. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvements.
Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module, 2) Centrality Estimation module, and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%–14.90%, 32.75%–36.59%, and 12.61%–17.54%, in terms of ROUGE-1, ROUGE-2, and ROUGE-L on automatic evaluation, and 5.79%–9.23% and 17.03%–17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that the automatic evaluation of our benchmark can uncover findings similar to the ones found through user studies. More importantly, automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot. We release the benchmark as well as the replication package of our experiment at https://anonymous.4open.science/r/TECHSUMBOT.
Research Papers
Thu 13 Oct 2022 16:40 - 17:00 at Ballroom C East - Technical Session 29 - AI for SE II Chair(s): Tim MenziesAlthough large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in the daily workflow of software developers: these large models consume hundreds of megabytes of memory and run slowly especially on personal devices, which causes problems in model deployment and greatly degrades the user experience.
It motivates us to propose Compressor, a novel approach that can compress the pre-trained models of code into extremely small models with negligible performance sacrifice. Our proposed method formulates the design of tiny models as simplifying the pre-trained model architecture: searching for a significantly smaller model that follows an architectural design similar to the original pre-trained model. To tackle this problem, Compressor proposes a genetic algorithm (GA)-based strategy to guide the simplification process. Prior studies found that a model with higher computational cost tends to be more powerful. Inspired by this insight, the GA algorithm is designed to maximize a model’s Giga floating-point operations (GFLOPs), an indicator of the model computational cost, to satisfy the constraint of the target model size. Then, we use the knowledge distillation technique to train the small model: unlabelled data is fed into the large model and the outputs are used as labels to train the small model. We evaluate Compressor with two state-of-the-art pre-trained models, i.e., CodeBERT and GraphCodeBERT, on two important tasks, i.e, vulnerability prediction and clone detection. We use the proposed method to compress models to a size (3 MB), which is only 0.6% of the original model size. The results show that compressed CodeBERT and GraphCodeBERT reduce the inference latency by 70.75% and 79.21%, respectively. More importantly, they maintain 96.15% and 97.74% of the original performance on the vulnerability prediction task. They even maintain higher ratios (99.20% and 97.52%) of the original performance on the clone detection task.
DOI Pre-print