Having roots in Search-Based Software Engineering (SBSE), the relatively young area of Genetic Improvement of Software (GI) aims to improve the properties of exisitng software. It can operate directly on source code, e.g., Java or C, and it typically starts with real-world software. This makes GI attractive for industrial applications, e.g. in contrast to Genetic Programming that aims to evolve applications from scratch. In this tutorial, we demonstrate how we can optimise with GI the physical properties of code such as power consumption, size of code, bandwidth, and other non-functional properties, including execution time.
Mature and robust software applications should posses several traits. Among others, they should be secure, provide the functionality desired by the stakeholder(s), be efficient and they should be accessible. Unfortunately, despite government legislation and demonstrated need, much of the software being developed today is not being created in an accessible manner. The objective of our Accessibility Learning Labs (ALL) is to both information participants about how to properly create accessible software, but importantly demonstrate the need to create accessible software. These experiential browser-based activities enable students, instructors and practitioners to utilize the material using only their browser. This tutorial will benefit a wide-range of participants in the software engineering community, ranging from students to experienced practitioners who want to ensure that they are properly creating inclusive, accessible software. Complete project material is publicly available on the project website: http://all.rit.edu
Experience
[experience paper] Although tremendous efforts have been devoted to the quality assurance of online service systems, in reality, these systems still come across many incidents (i.e., unplanned interruptions and outages), which can decrease user satisfaction or cause economic loss. To better understand the characteristics of incidents and improve the incident management process, we perform the first large-scale empirical analysis of incidents collected from 18 real-world online service systems in a multinational technology company M. Surprisingly, we find that although a large number of incidents could occur over a short period of time, many of them actually do not matter, i.e., engineers will not fix them with a high priority after manually diagnosing their root cause. We call these incidents incidental incidents. Our qualitative and quantitative analyses show that incidental incidents are significant in terms of both number and cost. Therefore, it is important to prioritize incidents by identifying incidental incidents in advance to optimize incident management efforts. In particular, we propose an approach, called DeepIP (Deep learning based Incident Prioritization), to prioritizing incidents based on a large amount of historical incident data. More specifically, we design an attention-based CNN (Convolutional Neural Network) model to learn a prediction model to identify incidental incidents. We then prioritize all incidents by ranking the predicted probabilities of incidents being incidental. We evaluate the performance of DeepIP using real-world incident data. The experimental results show that DeepIP effectively prioritizes incidents by identifying incidental incidents and significantly outperforms all the compared approaches. For example, the AUC of DeepIP achieves 0.808, while that of the best compared approach is only 0.624 onaverage. Also, we share our experience and lessons learned from practice.
Bug reports in a repository are generally organized line by line in a list-view, with their titles and other meta-data displayed. In this list-view, a concise and precise title plays an important role that enables project practitioners to quickly and correctly digest the core idea of the bug, without carefully reading the corresponding details. However, the quality of bug report titles varies in open-source communities, which may be due to the limited time and unprofessionalism of authors. To help report authors efficiently draft good-quality titles, we propose a method, named iTAPE, to automatically generate titles for their bug reports. iTAPE formulates title generation into a one-sentence summarization task. By properly tackling two domain-specific challenges (i.e. lacking off-the-shelf dataset and handling the low-frequency human-named tokens), iTAPE then generates titles using a Seq2Seq-based model. A comprehensive experimental study shows that iTAPE can obtain fairly satisfactory results, in terms of the comparison with three latest one-sentence summarization works, as well as the feedback from human evaluation.
Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the development of technology and aesthetics, the visual effects of the GUI are more and more attracting. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on difference devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues and develop a heuristics-based data augmentation method for boosting the performance of our OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues. We also evaluate OwlEye with popular Android apps on Google Play and F-droid, and successfully uncover 57 previously-undetected UI display issues with 26 of them being confirmed or fixed so far.
Experience
Vendors who wish to provide software or services to large corporations and governments must often obtain numerous certificates of compliance. Each certificate asserts that the software satisfies a compliance regime, like SOC or the PCI DSS, to protect the privacy and security of sensitive data. Manual compliance audits of source code (the industry standard) are expensive, error-prone, partial, and prone to regressions.
We propose \emph{continuous compliance} to guarantee that the codebase stays compliant on each code change using lightweight verification tools. Continuous compliance increases assurance and reduces cost in the domain of source-code compliance.
We evaluated continuous compliance by building and deploying verification tools for five common audit controls related to data security: cryptographically unsafe algorithms must not be used, keys must be at least 256 bits long, credentials must not be hard-coded into program text, HTTPS must always be used instead of HTTP, and cloud data stores must not be world-readable. We report on our experience deploying these verification tools at a large company, where they are integrated into the compliance process (including auditors accepting their output as evidence) and have been run on over 68 million lines of code. We open-sourced our tools and applied them to over 5 million lines of open-source software. Compared to other publicly-available tools for detecting misuses of encryption, only ours are suitable for continuous compliance.
The security guarantee of SSL/TLS critically depends on the correct validation of X.509 certificate. Therefore, it is important to check whether certificate validation in SSL/TLS is implemented correctly. Differential testing has been used successfully to find semantic bugs in this domain. However, existing differential testing tools suffer from three limitations: (1) inputs are not guaranteed to be syntactically correct. (2) the diversity of inputs is not enough. (3) requiring a large number inputs for finding each semantic bug.
This paper tackles these problems by introducing SADT, a novel syntax-aware differential testing framework for testing certificate validation code. Our core insight is to mutate input by tree-based mutation to ensure generated inputs are syntactically correct, and diversify inputs by share interesting inputs among all tested SSL/TLS implementations. The generated certificates are then employed to reveal discrepancies (potential bugs) among certificate validation in all SSL/TLS implementations.
We have implemented the new syntax-aware differential testing framework, named SADT, and evaluated it against other differential testing frameworks (such as NEZHA and RFCcert) and the fuzzer AFL. In our experiment, SADT yields 64 unique discrepancies when 6 SSL/TLS implementations are tested while NEZHA, RFCcert and AFL yield 31, 15 and 2 unique discrepancies respectively. In adition, we have been reporting bugs found by SADT to the software developers. Until now, 13 bugs have been confirmed or fixed, 10 of which were previously unknown bugs among all projects.
Serialisation related security vulnerabilities have recently been reported for numerous Java applications. Since serialisation presents both soundness and precision challenges for static analysis, it can be difficult for analyses to precisely pinpoint serialisation vulnerabilities in a Java library. In this paper, we propose a hybrid approach that extends a static analysis with fuzzing to detect serialisation vulnerabilities. The novelty of our approach is in its use of a heap model to direct fuzzing for vulnerabilities in Java libraries. The advantage is that the analysis guides fuzzing to quickly and effectively produce results, which may also automatically validate static analysis reports.
As most smart systems such as smart logistic and smart manufacturing are delay sensitive, the current mainstream cloud computing based system architecture is facing the critical issue of high latency over the Internet. Meanwhile, as huge amount of data is generated by smart devices with limited battery and computing power, the increasing demand for energy-efficient machine learning and secure data communication at the network edge has become a hurdle to the success of smart systems. To address these challenges with using smart UAV (Unmanned Aerial Vehicle) delivery system as an example, we propose EXPRESS, a novel energy-efficient and secure framework based on mobile edge computing and blockchain technologies. We focus on computation and data (resource) management which are two of the most prominent components in this framework. The effectiveness of the EXPRESS framework is demonstrated through the implementation of a real-world UAV delivery system. As an open-source framework, EXPRESS can help researchers implement their own prototypes and test their computation and data management strategies in different smart systems. The demo video can be found at https://youtu.be/r3U1iU8tSmk.
Literature reviews are essential before conducting research on any topic. A thorough literature review provides an overview of current knowledge, allowing researchers to identify relevant theories, methods, and gaps in the existing research. Yet a thorough literature review is always tedious and time-consuming, especially when a large portion of the time and effort is wasted on citation screening—identifying the dozens of relevant papers from hundreds to thousands of non-relevant search results. In this tutorial, we will demonstrate how FASTREAD, a machine learning tool (https://github.com/fastread/src), helps reduce the time and effort of citation screening by incrementally learning what the user is looking for and thus predicting which papers are more likely to be relevant. In the instructors’ own experience of using this tool, it reduced 50 man-hours (94%) of the citation screening effort and included 90% of the relevant papers.
This tutorial features three sections each presenting
A wide range of modern software-intensive systems (e.g., autonomous systems, big data analytics, robotics, deep neural architectures) is built configurable. These highly-configurable systems offer a rich space for adaptation to different domains and tasks. Developers and users often need to reason about the performance of such systems, making tradeoffs to change specific quality attributes or detecting performance anomalies. For instance, the developers of image recognition mobile apps are not only interested in learning which deep neural architectures are accurate enough to classify their images correctly, but also which architectures consume the least power on the mobile devices on which they are deployed. Recent research has focused on models built from performance measurements obtained by instrumenting the system. However, the fundamental problem is that the learning techniques for building a reliable performance model do not scale well, simply because the configuration space of systems is exponentially large that is impossible to exhaustively explore. For example, it will take over 60 years to explore the whole configuration space of a system with 25 binary options.
In this tutorial, I will start motivating the configuration space explosion problem based on my previous experience with large-scale big data systems in the industry. I will then present transfer learning as well as other machine learning techniques including multi-objective Bayesian optimization to tackle the sample efficiency challenge: instead of taking the measurements from the real system, we learn the performance model using samples from cheap sources, such as simulators that approximate the performance of the real system, with a fair fidelity and at a low cost. Results show that despite the high cost of measurement on the real system, learning performance models can become surprisingly cheap as long as certain properties are reused across environments. In the second half of the talk, I will present empirical evidence, which lays a foundation for a theory explaining why and when transfer learning works by showing the similarities of performance behavior across environments. I will present observations of environmental changes’ impacts (such as changes to hardware, workload, and software versions) for a selected set of configurable systems from different domains to identify the key elements that can be exploited for transfer learning. These observations demonstrate a promising path for building efficient, reliable, and dependable software systems as well as theoretically sound approaches for tackling performance optimization, testing, and debugging. Finally, I will share some promising and potential research directions including our recent progress on a performance debugging approach based on counterfactual causal inference.
This tutorial is targeted for practitioners as well as researchers that would like to go deeper into understanding new and potentially powerful approaches for modern highly-configurable systems. This tutorial will be also suitable for students (both undergraduate and graduate) who want to learn about potential research directions and how they can find a niche and fruitful area in research at the intersections of machine learning, systems, and software engineering.
Pooyan Jamshidi is an Assistant Professor at the University of South Carolina. He directs the AISys Lab, where he investigates the development of novel algorithmic and theoretically principled methods for machine learning systems. Prior to his current position, he was a research associate at Carnegie Mellon University and Imperial College London, where he primarily worked on transfer learning for performance understanding of highly-configurable systems including robotics and big data systems. Pooyan’s general research interests are at the intersection of systems/software and machine learning. He received his Ph.D. in Computer Science at Dublin City University in 2014, and M.S. and B.S. degrees in Computer Science and Math from the Amirkabir University of Technology in 2003 and 2006 respectively.