Empirical evaluation of the impact of class overlap on software defect prediction (ASE 2019 Research Papers)

Blogs (1) >>

Sun 10 - Fri 15 November 2019 San Diego, California, United States

Who

Lina Gong, Shujuan Jiang, Rongcun Wang, Li Jiang

Track

ASE 2019 Research Papers

When

Wed 13 Nov 2019 16:20 - 16:40 at Cortez 1 - Prediction Chair(s): Xin Xia

Abstract

Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contain some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K-Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within-project defect prediction (WPDP) (ii) cross-project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above-mentioned cases by using IKMCCA or KMCCA or NCL VS. Without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.

Lina Gong

China University of Mining and Technology

Shujuan Jiang

China University of Mining and Technology

Rongcun Wang

China University of Mining and Technology

Li Jiang

China University of Mining and Technology

Session Program

Wed 13 Nov

16:00 - 17:40: Papers - Prediction at Cortez 1
Chair(s): Xin XiaMonash University

16:00 - 16:20
Talk

Predicting Licenses for Changed Source Code

Xiaoyu LiuDepartment of Computer Science and Engineering, Southern Methodist University, Liguo HuangDept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Jidong GeState Key Laboratory for Novel Software and Technology, Nanjing University, Vincent NgHuman Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688

16:20 - 16:40
Talk

Empirical evaluation of the impact of class overlap on software defect prediction

Lina GongChina University of Mining and Technology, Shujuan JiangChina University of Mining and Technology, Rongcun WangChina University of Mining and Technology, Li JiangChina University of Mining and Technology

16:40 - 17:00
Talk

Combining Program Analysis and Statistical Language Model for Code Statement Completion

Son NguyenThe University of Texas at Dallas, Tien N. NguyenUniversity of Texas at Dallas, Yi LiNew Jersey Institute of Technology, USA, Shaohua WangNew Jersey Institute of Technology, USA

17:00 - 17:20
Talk

Balancing the trade-off between accuracy and interpretability in software defect prediction

Toshiki MoriCorporate Software Engineering & Technology Center, Toshiba Corporation, Naoshi UchihiraSchool of Knowledge Science, Japan Advanced Institute of Science and Technology (JAIST)

Link to publication File Attached

17:20 - 17:40
Talk

Fine-grained just-in-time defect prediction

Luca PascarellaDelft University of Technology, Fabio PalombaDepartment of Informatics, University of Zurich, Alberto BacchelliUniversity of Zurich

Link to publication