Blogs (1) >>
ASE 2019
Sun 10 - Fri 15 November 2019 San Diego, California, United States
Fri 15 Nov 2019 12:00 - 12:30 at Cortez 3 - Software Engineering Intelligence via NLP

A lot of research in Software Engineering (SE) automatically extract topics of the text data and use the results directly or as a feature for a machine learning method. Research has shown that the majority of studies in SE use Latent Dirichlet Allocation (LDA) as the topic modeling approach. Similarly, there is a lot of work that apply LDA on GitHub data. However, there is no study that explores whether LDA is a good choice compared to other algorithms, nor is there any to investigate the effects of specific pre-processing steps on its performance. In this paper, we explore a large dataset of GitHub repositories and apply two main topic modeling algorithms, LDA (3 variants) and Non-Negative Matrix Factorization (NMF), in several experiments with different experimental settings. The results show that LDA results in a higher coherence score compared to NMF. However, care should be taken in the choice of LDA algorithm, setting its parameters, and the text pre-processing steps. The results of this paper benefit SE researchers who apply intelligent techniques using LDA.

Fri 15 Nov

SEI-2019-papers
11:00 - 12:30: SEI 2019 - Software Engineering Intelligence via NLP at Cortez 3
SEI-2019-papers11:00 - 11:30
Talk
Mining Text in Incident Repositories: Experiences and Perspectives on Adopting Machine Learning Solutions in Practice.
SEI-2019-papers11:30 - 12:00
Talk
Predicting Defects with Latent and Semantic Features from Commit Logs in an Industrial Setting.
SEI-2019-papers12:00 - 12:30
Talk
Where Does LDA Sit for GitHub?