Service architecture adoption is widespread and brings many benefits, such as agile development and immutable infrastructure. However, it’s hard to govern and understand the vast service ecosystem as each application evolved independently and differently (e.g., features and development methods) from each team. In this paper, we present an approach to model and process application ecosystem as a knowledge graph. The application knowledge graph can help with architectural visibility, operational efficiency, and developer productivity.
Machine learning based data-driven solutions have potential to significantly improve quality of incident management process and make it cost effective. We present our experiences while addressing a spectrum of interrelated problems encountered in practice including identifying semantically related incidents, assignee recommendation, and mapping of incidents to business processes. We argue that despite long-standing research, it is not always straightforward to adopt recommendations from research into practice due to variability and complexity of business constraints and nature of data. We discuss need for meta-level analysis and suggest our own recommendations towards designing pragmatic solutions with low barriers to adoption and addressing right level of challenges.
Software defect prediction is still a challenging task in industrial settings. Noisy data and/or lack of data make it hard to build successful prediction models. In this study, we aim to build a change-level defect prediction model for a software project in an industrial setting. We combine various probabilistic models, namely matrix factorization and topic modeling, with the expectation of overcoming the noisy and limited nature of industrial settings by extracting hidden features from multiple resources. Commit level process metrics, latent features from commits, and semantic features from commit messages are combined to build the defect predictors with the use of log filtering and feature selection techniques, and two machine learning algorithms Naive Bayes and Extreme Gradient Boosting (XGBoost). Statistical tests show that collecting data from various sources and applying data pre-processing techniques show an improvement in terms of probability of detection by up to 24% when compared to a base model with process metrics only.
A lot of research in Software Engineering (SE) automatically extract topics of the text data and use the results directly or as a feature for a machine learning method. Research has shown that the majority of studies in SE use Latent Dirichlet Allocation (LDA) as the topic modeling approach. Similarly, there is a lot of work that apply LDA on GitHub data. However, there is no study that explores whether LDA is a good choice compared to other algorithms, nor is there any to investigate the effects of specific pre-processing steps on its performance. In this paper, we explore a large dataset of GitHub repositories and apply two main topic modeling algorithms, LDA (3 variants) and Non-Negative Matrix Factorization (NMF), in several experiments with different experimental settings. The results show that LDA results in a higher coherence score compared to NMF. However, care should be taken in the choice of LDA algorithm, setting its parameters, and the text pre-processing steps. The results of this paper benefit SE researchers who apply intelligent techniques using LDA.
Software has become an essential component of modern life, but when software vulnerabilities threaten the security of users, new ways of analyzing for software security must be explored. Using the National Institute of Standards and Technology’s Juliet Java Suite, containing thousands of examples of defective Java methods for a variety of vulnerabilities, a prototype tool was developed implementing an array of Long-Short Term Memory Recurrent Neural Networks to detect vulnerabilities within source code. The tool employs various data preparation methods to be independent of coding style and to automate the process of extracting methods, labeling data, and partitioning the dataset. The result is a prototype command-line utility that generates an n-dimensional vulnerability prediction vector. The experimental evaluation using 44,495 test cases indicates that the tool can achieve an accuracy higher than 90% for 24 out of 29 different types of CWE vulnerabilities.
Aligning the design of a system with its implementation improves product quality and simplifies product evolution. While developers are empowered with AI/ML augmented tools and techniques that increasingly assist them in implementation tasks, the abstraction gap between code design limits automation for design tasks. In this position paper, we argue that the software engineering community can take advantage of the experiences built with AI/ML techniques to advance automation in design analysis. In particular, combining multiple techniques shows promise. We summarize research challenges along the way and exemplify two such efforts that apply machine learning to codebases to extract design constructs and detect deviation from intended designs and use search-based refactoring on graph databases.
More and more tasks become solvable using deep learning technology nowadays. Consequently, the amount of neural network code in software rises continuously. To make the new paradigm more accessible, frameworks, languages, and tools keep emerging. Although, the maturity of these tools is steadily increasing, we still lack appropriate domain specific languages and a high degree of automation when it comes to deep learning for productive systems. In this paper we present a multi-paradigm language family allowing the AI engineer to model and train deep neural networks as well as to integrate them into software architectures containing classical code. Using input and output layers as strictly typed interfaces enables a seamless embedding of neural networks into component-based models. The lifecycle of deep learning components can then be governed by a compiler accordingly, e.g. detecting when (re-)training is necessary or when network weights can be shared between different network instances. We provide a compelling case study, where we train an autonomous vehicle for the TORCS simulator. Furthermore, we discuss how the methodology automates the AI development process if neural networks are changed or added to the system.
Software Quality and Context for Rich Source Code Representations.