Zhenlan Ji The Hong Kong University of Science and Technology, Pingchuan Ma HKUST, Shuai Wang Hong Kong University of Science and Technology, Yanhui Li Nanjing University
Causality-Aided Trade-off Analysis for Machine Learning Fairness
There has been an increasing interest in enhancing the fairness of machine learning (ML). Despite the growing number of fairness-improving methods, we lack a systematic understanding of the trade-offs among factors considered in the ML pipeline when fairness-improving methods are applied. This understanding is essential for developers to make informed decisions regarding the provision of fair ML services. Nonetheless, it is extremely difficult to analyze the trade-offs when there are multiple fairness parameters and other crucial metrics involved, coupled, and even in conflict with one another.
This paper uses causality analysis as a principled method for analyzing trade-offs between fairness parameters and other crucial metrics in ML pipelines. To practically and effectively conduct causality analysis, we propose a set of domain-specific optimizations to facilitate accurate causal discovery and a unified, novel interface for trade-off analysis based on well-established causal inference methods. We conduct a comprehensive empirical study using three real-world datasets on a collection of widely used fairness-improving techniques. Our study obtains actionable suggestions for users and developers of fair ML. We further demonstrate the versatile usage of our approach in selecting the optimal fairness-improving method, paving the way for more ethical and socially responsible AI technologies.
Towards Self-Adaptive Machine Learning-Enabled Systems Through QoS-Aware Model Switching
Machine Learning (ML), particularly deep learning, has seen vast advancements, leading to the rise of Machine Learning-Enabled Systems (MLS). However, numerous software engineering challenges persist in propelling these MLS into production, largely due to various run-time uncertainties that impact the overall Quality of Service (QoS). These uncertainties emanate from ML models, software components, and environmental factors. Self-adaptation techniques present potential in managing run-time uncertainties, but their application in MLS remains largely unexplored. As a solution, we propose the concept of a Machine Learning Model Balancer, focusing on managing uncertainties related to ML models by using multiple models. Subsequently, we introduce AdaMLS, a novel self-adaptation approach that leverages this concept and extends the traditional MAPE-K loop for continuous MLS adaptation. AdaMLS employs lightweight unsupervised learning for dynamic model switching, thereby ensuring consistent QoS. Through a self-adaptive object detection system prototype, we demonstrate AdaMLS’s effectiveness in balancing system and model performance. Preliminary results suggest AdaMLS surpasses naive and single state-of-the-art models in QoS guarantees, heralding the advancement towards self-adaptive MLS with optimal QoS in dynamic environments.
Artificial Intelligence (AI) enabled embedded devices are becoming increasingly important in the field of healthcare where such devices are utilized to assist physicians, clinicians, and surgeons in their diagnosis, therapy planning, and rehabilitation. However, it is still a challenging task to come up with an accurate and efficient machine learning model for resource-limited devices that work $24\times7$. It requires both intuition and experience. This dependence on human expertise and reliance on trial-and-error-based design methods create impediments to the standard processes of effort estimation, design phase planning, and generating service-level agreements for projects that involve AI-enabled MedTech devices.
In this paper, we present AutoML search from an algorithmic perspective, instead of a more prevalent optimization or black-box tool perspective. We briefly present and point to case studies that demonstrate the efficacy of the automation approach in terms of productivity improvements. We believe that our proposed method can make AutoML more amenable to the applications of software engineering principles and also accelerate biomedical device engineering, where there is a high dependence on skilled human resources.
Cell2Doc: ML Pipeline for Generating Documentation in Computational Notebooks
Computational notebooks have become the go-to way for solving data-science problems. While they are designed to combine code and documentation, prior work shows that documentation is largely ignored by the developers because of the manual effort. Automated documentation generation can help, but existing techniques fail to capture algorithmic details and developers often end up editing the generated text to provide more explanation and sub-steps. This paper proposes a novel machine-learning pipeline, Cell2Doc, for code cell documentation in Python data science notebooks. Our approach works by identifying different logical contexts within a code cell, generating documentation for them separately, and finally combining them to arrive at the documentation for the entire code cell. Cell2Doc takes advantage of the capabilities of existing pre-trained language models and improves their efficiency for code cell documentation. We also provide a new benchmark dataset for this task, along with a data- preprocessing pipeline that can be used to create new datasets. We also investigate an appropriate input representation for this task. Our automated evaluation suggests that our best input representation improves the pre-trained model’s performance by 2.5x on average. Further, Cell2Doc achieves 1.33x improvement during human evaluation in terms of correctness, informativeness, and readability against the corresponding standalone pre-trained model.
Uncertain, unpredictable, real-time, and lifelong evolution causes operational failures in intelligent software systems, leading to significant damages, safety and security hazards, and tragedies. To fully unleash such systems’ potential and facilitate their wider adoption, ensuring the trustworthiness of their decision-making under uncertainty is the prime challenge. To overcome this challenge, an intelligent software system and its operating environment should be continuously monitored, tested, and refined during its lifetime operation. Existing technologies, such as digital twins, can enable continuous synchronisation with such systems to reflect their most up-to-date states. Such representations are often in the form of prior-knowledge-based and machine-learning models, together called ‘model universe’. In this paper, we present our vision of combining techniques from software engineering, evolutionary computation, and machine learning to support the model universe evolution.