Registered user since Fri 16 Apr 2021
Contributions
View general profile
Registered user since Fri 16 Apr 2021
Contributions
Tutorials
Fri 14 Oct 2022 08:30 - 10:00 at Gold C - AI Quality Assurance I (Part I)Data-driven AI (e.g., deep learning) has become a driving force and has been applied in many applications across diverse domains. The human-competitive performance makes them stand as core components in complicated software systems for tasks, e.g., computer vision (CV) and natural language processing (NLP). Corresponding to the increasing popularity of deploying more powerful and complicated DL models, there is also a pressing need to ensure the quality and reliability of these AI systems. However, the data-driven paradigm and black-box nature make such AI software fundamentally different from classical software. To this end, new software quality assurance techniques for AI-driven systems are thus challenging and needed. In this tutorial, we introduce the recent progress in AI Quality Assurance, especially for testing techniques for DNNs and provide hands-on experience. We will first give the details and discuss the difference between testing for traditional software and AI software. Then, we will provide hands-on tutorials on testing techniques for feed-forward neural networks (FNNs) with a CV use case and recurrent neural networks (RNNs) with an NLP use case. Finally, we will discuss with the audience the success and failures in achieving the full potential of testing AI software as well as possible improvements and research directions. The materials are available at AI Quality Assurance
Journal-first Papers
Thu 13 Oct 2022 10:50 - 11:10 at Banquet B - Technical Session 21 - SE for AI II Chair(s): Andrea StoccoA growing demand is witnessed in both industry and academia for employing Deep Learning (DL) in various domains to solve real-world problems. Deep reinforcement learning (DRL) is the application of DL in the domain of Reinforcement Learning. Like any software system, DRL applications can fail because of faults in their programs. In this paper, we present the first attempt to categorize faults occurring in DRL programs. We manually analyzed 761 artifacts of DRL programs (from Stack Overflow posts and GitHub issues) developed using well-known DRL frameworks (OpenAI Gym, Dopamine, Keras-rl, Tensorforce) and identified faults reported by developers/users. We labeled and taxonomized the identified faults through several rounds of discussions. The resulting taxonomy is validated using an online survey with 19 developers/researchers. To allow for the automatic detection of faults in DRL programs, we have defined a meta-model of DRL programs and developed DRLinter, a model-based fault detection approach that leverages static analysis and graph transformations. The execution flow of DRLinter consists in parsing a DRL program to generate a model conforming to our meta-model and applying detection rules on the model to identify faults occurrences. The effectiveness of DRLinter is evaluated using 21 synthetic and real faulty DRL programs. For synthetic samples, we injected faults observed in the analyzed artifacts from Stack Overflow and GitHub. The results show that DRLinter can successfully detect faults in both synthesized and real-world examples with a recall of 75% and a precision of 100%.
Link to publication DOI Authorizer linkResearch Papers
Tue 11 Oct 2022 11:10 - 11:30 at Ballroom C East - Technical Session 1 - AI for SE I Chair(s): Andrea StoccoAircraft industry is constantly striving for more efficient design optimization methods in terms of human efforts, computation time, and resources consumption. Hybrid surrogate optimization maintains high results quality while providing rapid design assessments when both the surrogate model and the switch mechanism for eventually transitioning to the HF model are calibrated properly. Feedforward neural networks (FNNs) can capture highly nonlinear input-output mappings, yielding efficient surrogates for aircraft performance factors. However, FNNs often fail to generalize over the out-of-distribution (OOD) samples, which hinders their adoption in critical aircraft design optimization. Through SmOOD, our smoothness-based out-of-distribution detection approach, we propose to codesign a model-dependent OOD indicator with the optimized FNN surrogate, to produce a trustworthy surrogate model with selective but credible predictions. Unlike conventional uncertainty-grounded methods, SmOOD exploits inherent smoothness properties of the HF simulations to effectively expose OODs through revealing their suspicious sensitivities, thereby avoiding over-confident uncertainty estimates on OOD samples. By using SmOOD, only high-risk OOD inputs are forwarded to the HF model for re-evaluation, leading to more accurate results at a low overhead cost. Three aircraft performance models are investigated. Results show that FNN-based surrogates outperform their Gaussian Process counterparts in terms of predictive performance. Moreover, SmOOD does cover averagely 85% of actual OODs on all the study cases. When SmOOD plus FNN surrogates are deployed in hybrid surrogate optimization settings, they result in a decrease error rate of 34.65% and a computational speed up rate of 58.36x, respectively.
Pre-print