Assessing the Generalizability of code2vec Token Embeddings (ASE 2019 Research Papers)

Blogs (1) >>

Sun 10 - Fri 15 November 2019 San Diego, California, United States

Who

Kang Hong Jin, Tegawendé F. Bissyandé, David Lo

Track

ASE 2019 Research Papers

When

Tue 12 Nov 2019 10:40 - 11:00 at Cortez 2&3 - AI and SE Chair(s): Kaiyuan Wang

Abstract

Many Natural Language Processing (NLP) tasks, such as sentiment analysis or syntactic parsing, have benefited from the development of word embedding models. In particular, regardless of the training algorithms, the learned embeddings have often been shown to be generalizable to different NLP tasks. In contrast, despite recent momentum on word embeddings for source code, the literature lacks evidence of their generalizability beyond the example task they have been trained for.

In this experience paper, we identify 3 potential downstream tasks, namely code comments generation, code authorship identification, and code clones detection, that source code token embedding models can be applied to. We empirically assess a recently proposed code token embedding model, namely code2vec’s token embeddings. Code2vec was trained on the task of predicting method names, and while there is potential for using the vectors it learns on other tasks, it has not been explored in literature. Therefore, we fill this gap by focusing on its generalizability for the tasks we have identified. Eventually, we show that source code token embeddings cannot be readily leveraged for the downstream tasks. Our experiments even show that our attempts to use them do not result in any improvements over less sophisticated methods. We call for more research into effective and general use of code embeddings.

Link to Preprint

http://www.mysmu.edu/faculty/davidlo/papers/ase19-code2vec.pdf

Kang Hong Jin

School of Information Systems, Singapore Management University

Tegawendé F. Bissyandé

SnT, University of Luxembourg

Luxembourg

David Lo

Singapore Management University

Singapore

Session Program

Tue 12 Nov

10:40 - 12:20: Papers - AI and SE at Cortez 2&3
Chair(s): Kaiyuan WangGoogle, Inc.

10:40 - 11:00
Talk

Assessing the Generalizability of code2vec Token Embeddings

Kang Hong JinSchool of Information Systems, Singapore Management University, Tegawendé F. BissyandéSnT, University of Luxembourg, David LoSingapore Management University

Pre-print

11:00 - 11:20
Talk

Multi-Modal Attention Network Learning for Semantic Source Code Retrieval

Yao WanZhejiang University, Jingdong ShuZhejiang University, Yulei SuiUniversity of Technology Sydney, Australia, Guandong XuUniversity of Technology, Sydney, Zhou ZhaoZhejiang University, Jian WuZhejiang University, philip yuUniversity of Illinois at Chicago

11:20 - 11:40
Talk

Experience Paper: Search-based Testing in Automated Driving Control ApplicationsACM SIGSOFT Distinguished Paper Award

Christoph GladischCorporate Research, Robert Bosch GmbH, Thomas HeinzCorporate Research, Robert Bosch GmbH, Christian HeinzemannCorporate Research, Robert Bosch GmbH, Jens OehlerkingCorporate Research, Robert Bosch GmbH, Anne von VietinghoffCorporate Research, Robert Bosch GmbH, Tim PfitzerRobert Bosch Automotive Steering GmbH

11:40 - 12:00
Talk

Machine Translation-Based Bug Localization Technique for Bridging Lexical Gap

Yan XiaoDepartment of Computer Science, City University of Hong Kong, Jacky KeungDepartment of Computer Science, City University of Hong Kong, Kwabena E. BenninBlekinge Institute of Technology, SERL Sweden, Qing MiDepartment of Computer Science, City University of Hong Kong

Link to publication

12:00 - 12:10
Talk

AutoFocus: Interpreting Attention-based Neural Networks by Code Perturbation

Nghi Duy Quoc BuiSingapore Management University, Singapore, Yijun YuThe Open University, UK, Lingxiao JiangSingapore Management University

Pre-print

12:10 - 12:20
Demonstration

A Quantitative Analysis Framework for Recurrent Neural Network

Xiaoning DuNanyang Technological University, Xiaofei XieNanyang Technological University, Yi LiNanyang Technological University, Lei MaKyushu University, Yang LiuNanyang Technological University, Singapore, Jianjun ZhaoKyushu University