Not registered as user yet
Contributions
View general profile
Not registered as user yet
Contributions
Research Papers
Thu 13 Oct 2022 13:50 - 14:10 at Ballroom C East - Technical Session 25 - Software Repairs Chair(s): Yannic NollerTrained with a sufficiently large training and testing dataset, Deep Neural Networks (DNNs) are expected to generalize. However, inputs may deviate from the training dataset distribution in real deployments. This is a fundamental issue with using a finite dataset, which may lead deployed DNNs to mis-predict in production.
Inspired by input-debugging techniques for traditional software systems, we propose a runtime approach to identify and fix failure-inducing inputs in deep learning systems. Specifically, our approach targets DNN mis-predictions caused by unexpected (deviating and out-of-distribution) runtime inputs. Our approach has two steps. First, it recognizes and distinguishes deviating (``unseen'' semantically-preserving) and out-of-distribution inputs from in-distribution inputs. Second, our approach fixes the failure-inducing inputs by transforming them into inputs from the training set that have similar semantics. We call this process \emph{input reflection} and formulate it as a search problem over the embedding space on the training set.
We implemented a tool called InputReflector based on the above two-step approach and evaluated it with experiments on three DNN models trained on CIFAR-10, MNIST, and FMNIST image datasets. The results show that InputReflector can effectively distinguish deviating inputs that retain semantics of the distribution (e.g., zoomed images) and out-of-distribution inputs from in-distribution inputs. InputReflector repairs deviating inputs and achieves 30.78% accuracy improvement over original models. We also illustrate how InputReflector can be used to evaluate tests generated by deep learning testing tools.
Research Papers
Wed 12 Oct 2022 11:40 - 12:00 at Ballroom C East - Technical Session 9 - Security and Privacy Chair(s): Wei YangBrowser extensions have emerged as integrated characteristics in modern browsers, with the aim to boost the online browsing experience. Their advantageous position between a user and the Internet grants them easy access to the user’s sensitive personal data, which has raised mounting privacy concerns from both legislators and the extension users. In this work, we propose an end-to-end automatic extension privacy compliance auditing approach, analyzing the compliance of privacy policy versus regulation requirements and their actual privacy-related practices during runtime.
Our approach utilizes the state-of-the-art language processing model BERT for annotating the policy texts, and a hybrid technique to analyze the privacy-related elements (e.g., API calls and HTML objects) from the static source code and dynamically generated files during runtime. We collect a comprehensive dataset within 42 hours in April 2022, containing a total of 64,114 extensions. To facilitate the model training, we construct a corpus named PrivAud-100 which contains 100 manually annotated privacy policies. Based on this dataset and the corpus, we conduct a systematic audition, and identify widespread privacy compliance issues. We find around 92% of the extensions have at least one violation in either their privacy policies or data collection practices. We further propose an index to facilitate the filtering and identification of extensions with significant probability of privacy compliance violations. Our work should raise the awareness from the extension users, service providers, and platform operators, and encourage them to implement solutions towards better privacy compliance. To facilitate future research in this area, we have released our dataset.