Visualization is very important for machine learning (ML) pipelines because it can show explorations of the data to inspire data scientists and show explanations of the pipeline to improve understandability and trust. In this paper, we present a novel approach that automatically generates visualizations for ML pipelines by learning visualizations from highly-voted Kaggle pipelines. The solution extracts both code and dataset features from these high-quality human-written pipelines and corresponding training datasets, learns the mapping rules from code and dataset features to visualizations using association rule mining (ARM), and finally uses the learned rules to predict visualizations for unseen ML pipelines. The evaluation results show that the proposed solution is feasible and effective to generate visualizations for ML pipelines.
Lei Liu Fujitsu Laboratories of America, Inc., Wei-Peng Chen Fujitsu Research of America, Inc., Mehdi Bahrami Fujitsu Laboratories of America, Inc., Mukul Prasad Amazon Web Services
B-AIS: An Automated Process for Black-box Evaluation of AI-enabled Software Systems against Domain Semantics
AI-enabled software systems (AIS) are prevalent in a wide range of applications, such as visual tasks of autonomous systems, extensively deployed in automotive, aerial, and naval domains. Hence, it is crucial for human to evaluate the model’s intelligence before AIS is deployed to safety-critical environments, such as public roads.
In this paper, we assess AIS visual intelligence through measuring the completeness of its perception of primary concepts in a domain and the concept variants. For instance, is the visual perception of an autonomous detector mature enough to recognize the instances of \textit{pedestrian} (an automotive domain’s concept) in Halloween customs? An AIS will be more reliable once the model’s ability to perceive a concept is displayed in a human-understandable language. For instance, is the pedestrian in \textit{wheelchair} mistakenly recognized as a pedestrian on \textit{bike}, since the domain concepts bike and wheelchair, both associate with a mutual feature \textit{wheel}?
We answer the above-type questions by implementing a generic process within a framework, called B-AIS, which systematically evaluates AIS perception against the semantic specifications of a domain, while treating the model as a black-box. Semantics is the meaning and understanding of words in a language, and therefore, is more comprehensible for human brain than AIS pixel-level visual information. B-AIS processes the heterogeneous artifacts to be comparable, and leverages the comparison’s results to reveal AIS weaknesses in a human-understandable language. The evaluations of B-AIS for the vision task of pedestrian detection showed B-AIS identified the missing variants of the pedestrian with $F_{2}$ measures of 95% and in the dataset and 85% in the model.
Automatic Generation of Visualizations for Machine Learning Pipelines
Visualization is very important for machine learning (ML) pipelines because it can show explorations of the data to inspire data scientists and show explanations of the pipeline to improve understandability and trust. In this paper, we present a novel approach that automatically generates visualizations for ML pipelines by learning visualizations from highly-voted Kaggle pipelines. The solution extracts both code and dataset features from these high-quality human-written pipelines and corresponding training datasets, learns the mapping rules from code and dataset features to visualizations using association rule mining (ARM), and finally uses the learned rules to predict visualizations for unseen ML pipelines. The evaluation results show that the proposed solution is feasible and effective to generate visualizations for ML pipelines.
SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design
Virtual
Aircraft industry is constantly striving for more efficient design optimization methods in terms of human efforts, computation time, and resources consumption. Hybrid surrogate optimization maintains high results quality while providing rapid design assessments when both the surrogate model and the switch mechanism for eventually transitioning to the HF model are calibrated properly. Feedforward neural networks (FNNs) can capture highly nonlinear input-output mappings, yielding efficient surrogates for aircraft performance factors. However, FNNs often fail to generalize over the out-of-distribution (OOD) samples, which hinders their adoption in critical aircraft design optimization. Through SmOOD, our smoothness-based out-of-distribution detection approach, we propose to codesign a model-dependent OOD indicator with the optimized FNN surrogate, to produce a trustworthy surrogate model with selective but credible predictions. Unlike conventional uncertainty-grounded methods, SmOOD exploits inherent smoothness properties of the HF simulations to effectively expose OODs through revealing their suspicious sensitivities, thereby avoiding over-confident uncertainty estimates on OOD samples. By using SmOOD, only high-risk OOD inputs are forwarded to the HF model for re-evaluation, leading to more accurate results at a low overhead cost. Three aircraft performance models are investigated. Results show that FNN-based surrogates outperform their Gaussian Process counterparts in terms of predictive performance. Moreover, SmOOD does cover averagely 85% of actual OODs on all the study cases. When SmOOD plus FNN surrogates are deployed in hybrid surrogate optimization settings, they result in a decrease error rate of 34.65% and a computational speed up rate of 58.36x, respectively.
Unveiling Hidden DNN Defects with Decision-Based Metamorphic Testing
Virtual
Contemporary DNN testing works are frequently conducted using metamorphic testing (MT). In general, de facto MT frameworks mutate DNN input images using semantics-preserving mutations and determine if DNNs can yield consistent predictions. Nevertheless, we find that DNNs may rely on erroneous decisions to make predictions, which may still retain the outputs by chance. Such DNN defects would be neglected by existing MT frameworks. Erroneous decisions, however, would likely result in successive mispredictions over diverse images that may exist in real-life scenarios.
This research aims to unveil the pervasiveness of hidden DNN defects caused by incorrect DNN decisions (but retaining consistent DNN predictions). To do so, we tailor and optimize modern eXplainable AI (XAI) techniques to identify visual concepts that represent regions in an input image upon which the DNN makes predictions. Then, we extend existing MT-based DNN testing frameworks to check the consistency of DNN decisions made over a test input and its mutated outputs. Our evaluation shows that existing MT frameworks are oblivious to a considerable number of DNN defects caused by erroneous decisions. We conduct human evaluations to justify the validity of our findings and to elucidate their characteristics. Through the lens of DNN decision-based metamorphic relations, we re-examine the effectiveness of metamorphic transformations proposed by existing MT frameworks. We summarize lessons from this study, which can provide insights and guidelines for future DNN testing.
Patching Weak Convolutional Neural Network Models through Modularization and Composition
Virtual
Despite the great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the entire model. Inspired by the fundamental concepts of modularization and composition in software engineering, we propose a structured modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. Each module is a sub-model containing a part of the convolution kernels of the strong model. To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model. The ability of the weak CNN model to recognize the TC can thus be improved through patching. Moreover, the ability to recognize non-TCs is also improved, as the samples misclassified as TC could be classified as non-TCs correctly. Experimental results with two representative CNNs on three widely-used public datasets show that the averaged improvement on the TC in terms of precision and recall are 12.54% and 2.14%, respectively. Moreover, patching improves the accuracy of non-TCs by 1.18%. The results demonstrate that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment
Virtual
The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, which hinders the large-scale deployment on resource-restricted devices (e.g., smartphone). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in the big model may be inherited by the compressed one. Such defects may be easily leveraged by attackers, since the compressed models are usually deployed in a large number of devices without adequate protection. In this paper, we try to address the safe model compression problem from a safety-performance co-optimization perspective. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism to fight, SafeCompress can automatically compress a big model to a sparse one. Further, considering a representative attack, i.e., membership inference attack (MIA), we develop a concrete safe model compression mechanism, called MIA-SafeCompress. Extensive experiments are conducted to evaluate MIA-SafeCompress on five datasets for both computer vision and natural language processing tasks. The results verify the effectiveness and generalization of our method. We also discuss how to adapt SafeCompress to other attacks besides MIA, demonstrating the flexibility of SafeCompress.