Workshop
Workshop
Abstract: Achieving successful technology adoption in industrial practices has often been an important goal for both academic and industrial researchers. However, it is generally challenging for researchers to successfully transfer research results into industrial practices or collaborate with industrial practitioners to produce solutions for successful adoption in industrial practices. This talk presents experiences and lessons learned from years of collaborating with industrial practitioners on successfully developing and deploying solutions for mobile app testing and analysis in the context of major companies such as Tencent (WeChat) and Alibaba.
Short Bio: Tao Xie is a Chair Professor in the Department of Computer Science and Technology at Peking University, Beijing, China. He received an NSF CAREER Award, IEEE CS TCSE Distinguished Service Award, and various industrial faculty awards and distinguished/best paper awards. He is a co-Editor-in-Chief of the Wiley journal of Software Testing, Verification and Reliability (STVR). He served as the ISSTA 2015 Program Chair and the Tapia 2017/2018 Program/General Chair, and serves as an ICSE 2021 Program Co-Chair. He was selected by Lero as a David Lorge Parnas Fellow in 2019. He was selected as an ACM Distinguished Scientist in 2015, an IEEE Fellow in 2018, and an AAAS Fellow in 2019.
Workshop
Mobile apps now have become the most popular way of accessing the Internet as well as performing daily tasks, e.g., reading, shopping, banking and chatting. Different from traditional desktop applications, mobile apps are typically developed under time-to-market pressure and facing severe competitions. Therefore, for app developers and companies, it is crucial to accelerate the app development process. Towards this target, we are leveraging the large-scale mobile apps as the big data and data-driven methods like deep learning, program analysis to automate some app development process. In this talk, I will introduce some of my latest research works in automated software engineering within the app development cycle including extracting storyboard from apps for requirement engineering, constructing UI design gallery, converting UI design into code automatically, mobile app GUI animation testing, and accessibility testing.
I am a lecturer (a.k.a Assistant Professor) at Monash University. My primary research interests are in Software Engineering, Applied Data Analysis, Deep Learning, and Human-Computer Interaction. I am especially concerned with applying data analysis to automated software development and mining software repositories.
Workshop
Conducting measurement-based experiments is fundamental for assessing the quality of Android apps in terms of, e.g., energy consumption, CPU, and memory usage. However, orchestrating such experiments is not trivial as it requires large boilerplate code, careful setup of measurement tools, and the adoption of various empirical best practices scattered across the literature. All together, those factors are slowing down the scientific advancement and harming experiments’ replicability in the mobile software engineering area. In this paper we present Android Runner (AR), a framework for automatically executing measurement-based experiments on native and web apps running on Android devices. In AR, an experiment is defined once in a descriptive fashion, and then its execution is fully automatic, customizable, and replicable. AR is implemented in Python and it can be extended with third-party profilers. AR has been used in more than 25 scientific studies primarily targeting performance and energy efficiency.
Workshop
Workshop
Workshop
Workshop
Nowadays, energy efficiency is recognized as a core quality attribute of applications (apps) running on Android-powered devices constrained by their battery. Indeed, energy hogging apps are a liability to both the end-user and software developer. Yet, there are very few tools available to help developers increase the quality of their native code by ridding it of energy-related bugs. Android Studio is the official IDE for millions of developers worldwide and there’s no better place to enforce green coding rules in everyday projects. Indeed, Android Studio provides a code scanning tool called Android lint that can be extended with lacking green checks in order to foster the design of more eco-responsible apps.
The All-SAT (All-SATisfiable) problem focuses on finding all satisfiable assignments of a given propositional formula, whose applications include model checking, automata construction, and logic minimization. A typical ALL-SAT solver is normally based on iteratively computing satisfiable assignments of the given formula. In this work, we introduce BASolver, a backbone-based All-SAT solver for propositional formulas. Compared to the existing approaches, BASolver generates shorter blocking clauses by removing backbone variables from the partial assignments and the blocking clauses. We compare BASolver with 4 existing ALL-SAT solvers, namely MBlocking, BC, BDD, and NBC. Experimental results indicate that although finding all the backbone variables consumes additional computing time, BAsolver is still more efficient than the existing solvers because of the shorter blocking clauses and the backbone variables used in it.
With the 608 formulas, BASolver solves the largest amount of formulas (86), which is 22%, 36%, 68%, 86% more formulas than MBlocking, BC, NBC, and BDD respectively. For the formulas that are both solved by BASolver and the other solvers, BASolver uses 88.4% less computing time on average than the other solvers. For the 215 formulas which first 1000 satisfiable assignments are found by at least one of the solvers, BASolver uses 180% less computing time on average than the other solvers.
A common problem in MPI programs is deadlock: when two or more processes are blocked indefinitely due to a circular communication dependency. Automatically detecting deadlock is difficult due to its schedule-dependent nature. This paper presents a predictive analysis for single-path MPI programs that observes a single program execution and then determines whether any other feasible schedule of the program can lead to a deadlock. The analysis works by identifying problematic communication patterns in a dependency graph to form a set of deadlock candidates. The deadlock candidates are filtered by an abstract machine and ultimately tested for reachability by an SMT solver with an efficient encoding for deadlock. This approach quickly yields a set of high probability deadlock candidates useful for reasoning about complex codes and yields higher performance overall in many cases compared to other state-of-the-art analyses. The analysis is sound and complete for single-path MPI programs on a given input.
We propose a novel approach to proving the termination of imperative programs by k-induction. By our approach, the termination proving problem can be formalized as a k-inductive invariant synthesis task. On the one hand, k-induction uses weaker invariants than that required by the standard inductive approach. On the other hand, the base case of k-induction, which unrolls the program, can provide stronger pre-condition for invariant synthesis. As a result, the termination arguments of our approach can be synthesized more efficiently than the standard method. We implement a prototype of our k-inductive approach. The experimental results show the significant effectiveness and efficiency of our approach.
Library migration is a challenging problem, where most existing approaches rely on prior knowledge. This can be, for example, information derived from change logs or statistical models of API usage.
This paper addresses a different API migration scenario where there is no prior knowledge of the target library. We have no historical changelogs and no access to its internal representation. To tackle this problem, this paper proposes a novel approach (M$^3$), where probabilistic program synthesis is used to semantically model the behavior of library functions. Then, we use an SMT-based code search engine to discover similar code in user applications. These discovered instances provide potential locations for API migrations.
We evaluate our approach against 7 well-known libraries from varied application domains, learning correct implementations for 94 functions. Our approach is integrated with standard compiler tooling, and we used this integration to evaluate migration opportunities in 9 existing C/C++ applications with over 1MLoC. We discover over 7,000 instances of these functions, of which more than 2,000 represent migration opportunities.
The increasing adoption of the Linux kernel has been sustained by a large and constant maintenance effort, performed by a wide and heterogeneous base of contributors. One important problem that maintainers face in any code base is the rapid understanding of complex data structures. The Linux kernel is written in the C language, which enables the definition of arbitrarily uninformative datatype, via the use of casts and pointer arithmetic, of which doubly linked lists are a prominent example. In this paper, we explore the advantages and disadvantages of such lists, for expressivity, for code understanding, and for code reliability. Based on our observations, we have developed a toolset that includes inference of descriptive list types and a tool for list visualization. Our tools identify more than 10.000 list fields and variables in recent Linux kernel releases and succeeds in typing more than 90%. We show how these tools could have been used to detect previously fixed bugs and identify 6 new ones.
Developers continuously invent new practices, usually grounded in hard-won experience, not theory. Game theory studies cooperation and conflict; its use will speed the development of effective processes. A survey of game theory in software engineering finds highly idealised models that are rarely based on process data. This is because software processes are hard to analyse using traditional game theory since they generate huge game models. We are the first to show how to use game abstractions, developed in artificial intelligence, to produce tractable game-theoretic models of software practices. We present Game-Theoretic Process Improvement (GTPI), built on top of empirical game-theoretic analysis. Some teams fall into the habit of preferring “quick-and-dirty” code to slow-to-write, careful code, incurring technical debt. We showcase GTPI’s ability to diagnose and improve such a development process. Using GTPI, we discover a lightweight intervention that incentivises developers to write careful code: add a single code reviewer who needs to catch only 25% of kludges. This 25% accuracy is key; it means that a reviewer does not need to examine each commit in-depth, making this process intervention cost-effective.
Link to Publication: https://www.sciencedirect.com/science/article/pii/S0164121219301980
Smartphone vendors are using multiple methods to kill processes of Android apps to reduce the battery consumption. This motivates developers to find ways to extend the liveness time of their apps, hence the name diehard apps in this paper. Although there are blogs and articles illustrating methods to achieve this purpose, there is no systematic research about them. What’s more important, little is known about the prevalence of diehard apps in the wild. In this paper, we take a first step to systematically investigate diehard apps by answering the following research questions. First, why and how can they circumvent the resource-saving mechanisms of Android? Second, how prevalent are they in the wild? In particular, we conduct a semi-automated analysis to illustrate insights why existing methods to kill app processes could be evaded, and then systematically present 12 diehard methods. After that, we develop a system named DiehardDetector to detect diehard apps in a large scale. The experimental result of applying DiehardDetector to more than 80k Android apps downloaded from Google Play showed that around 21% of apps adopt various diehard methods. Moreover, our system can achieve high precision and recall.
The UI driven nature of Android apps has motivated the development of automated UI analysis for various purposes, such as app analysis, malicious app detection, and app testing. Although existing automated UI analysis methods have demonstrated their capability in dissecting apps’ UI, little is known about their effectiveness in the face of app protection techniques, which have been adopted by more and more apps. In this paper, we take a first step to systematically investigate UI obfuscation for Android apps and its effects on automated UI analysis. In particular, we point out the weaknesses in existing automated UI analysis methods and design 9 UI obfuscation approaches. We implement these approaches in a new tool named UIObfuscator after tackling several technical challenges. Moreover, we feed 3 kinds of tools that rely on automated UI analysis with the apps protected by UIObfuscator, and find that their performances severely drop. This work reveals limitations of automated UI analysis and sheds light on app protection techniques.
Mobile operating systems evolve quickly, frequently updating the APIs that app developers use to build their apps. Unfortunately, API updates do not always guarantee backward compatibility, causing apps to not longer work properly or even crash when running with an updated system. This paper presents FILO, a tool that assists Android developers in resolving backward compatibility issues introduced by API upgrades. FILO both suggests the method that needs to be modified in the app in order to adapt the app to an upgraded API, and reports key symptoms observed in the failed execution to facilitate the fixing activity. Results obtained with the analysis of 12 actual upgrade problems and the feedback produced by early tool adopters show that FILO can practically support Android developers. FILO can be downloaded from https://gitlab.com/learnERC/filo, and its video demonstration is available at https://youtu.be/WDvkKj-wnlQ.
Symbolic execution is still facing the scalability problem caused by path explosion and constraint solving overhead. The recently proposed MuSE framework supports exploring multiple paths by generating partial solutions in one time of solving. In this work, we improve MuSE from two aspects. Firstly, we use a light-weight check to reduce redundant partial solutions, avoiding the wasteful executions having the same results. Secondly, we introduce online learning to devise an adaptive search strategy for the target programs. The preliminary experimental results indicate the promising of the proposed methods.
The robustness of deep neural network (DNN) is critical and challenging to ensure. In this paper, we propose a general data-oriented mutation framework, called Styx, to improve the robustness of DNN. Styx generates new training data by slightly mutating the training data. In this way, Styx ensures the DNN’s accuracy on the test dataset while improving the adaptability to small perturbations, i.e., improving the robustness. We have instantiated Styx for image classification and proposed pixel-level mutation rules that are applicable to any image classification DNNs. We have applied Styx on several commonly used benchmarks and compared Styx with the representative adversarial training methods. The preliminary experimental results indicate the effectiveness of Styx.
Constraint solving is one of the challenges for symbolic execution. Modern SMT solvers allow users to customize the internal solving procedure by solving strategies. In this extended abstract, we report our recent progress in synthesizing a program-specific solving strategy for the symbolic execution of a program. We propose a two-stage procedure for symbolic execution. At the first stage, we synthesize a solving strategy by utilizing deep learning techniques. Then, the strategy will be used in the second stage to improve the performance of constraint solving. The preliminary experimental results indicate the promising of our method.
Since the new privacy feature in iOS enabling users to acknowledge which app is reading or writing to his or her clipboard through prompting notifications was updated, a plethora of top apps have been reported to frequently access the clipboard without user consent. However, the lack of monitoring and control of Android application’s access to the clipboard data leave Android users blind to their potential to leak private information from Android clipboards, raising severe security and privacy concerns. In this preliminary work, we envisage and investigate an approach to (i) dynamically detect clipboard access behaviour, and (ii) determine privacy leaks via static data flow analysis, in which we enhance the results of taint analysis with call graph concatenation to enable leakage source backtracking. Our preliminary results indicate that the proposed method can expose clipboard data leakage as substantiated by our discovery of a popular app, Sogou Input, directly monitoring and transferring user data in a clipboard to backend servers.
In software development, issue tracker systems are widely used to manage bug reports. In such a system, a bug report can be filed, diagnosed, assigned, and fixed. In the standard process, a bug can be resolved as \emph{fixed}, \emph{invalid}, \emph{duplicated} or \emph{won’t fix}. Although the above resolutions are well-defined and easy to understand, a bug report can end with a less known resolution, \emph{i.e.}, \emph{workaround}. Compared with other resolutions, the definition of workarounds is more ambiguous. Besides the problem that is reported in a bug report, the resolution of a workaround raises more questions. Some questions are important for users, especially those programmers who build their projects upon others (\emph{e.g.}, libraries). Although some early studies have been conducted to analyze API workarounds, many research questions on workarounds are still open. For example, which bugs are resolved as workarounds? Why is a bug report resolved as workarounds? What are the repairs of workarounds? In this experience paper, we conduct the first empirical study to explore the above research questions. In particular, we analyzed 221 real workarounds that were collected from Apache projects. Our results lead to ten findings and our answers to all the above questions. For example, we find that most bug reports are resolved as workarounds, because their problems arise across projects (38.01%) or reside in the environments (21.27%). Although the problems of some workarounds (37.56%) reside in the project where they are reported, they are difficult to be fixed fully and perfectly. Our findings are useful to understand workarounds and to improve software projects and issue trackers.
Artificial Intelligence (AI) has been widely used in smart systems such as smart health and smart agriculture to enable intelligent services for people and other smart systems. At present, most of the smart systems are based on cloud computing, and massive data generated at the smart end device will need to be transferred to the cloud where AI models are deployed. Therefore, a big challenge for smart system engineers is that cloud based smart systems often face issues such as network congestion and high latency. In recent years, mobile edge computing (MEC) is becoming a promising solution which supports computation-intensive tasks such as deep learning through computation offloading to the servers located at the local network edge. To take full advantage of MEC, an effective collaboration between the end device and the edge server is essential. However, this is a brand new and challenging issue for smart system engineers. In this paper, as an initial investigation, we propose Edge4Sys, a Device-Edge Collaborative Framework for MEC based Smart System. Specifically, we employ the deep learning based user identification process in a MEC-based UAV (Unmanned Aerial Vehicle) delivery system as a case study to demonstrate the effectiveness of the proposed framework which can significantly reduce the network traffic and the response time.
While traditionally, software comprehension relies on approaches like reading through the code or looking at charts on screens, which are 2D mediums, there have been some recent approaches that advocate exploring 3D approaches like Augmented or Virtual Reality (AR/VR) to have a richer experience towards understanding software and its internal relationships. However, there is a dearth of objective studies that compare such 3D representations with their traditional 2D counterparts in the context of software comprehension. In this paper, we present an evaluation study to quantitatively and qualitatively compare 2D and 3D software representations with respect to typical comprehension tasks. For the 3D medium, we utilize an AR-based approach for 3D visualizations of a software system (XRaSE), while the 2D medium comprises of textual IDEs and 2D graph representations. The study, which has been conducted using 20 professional developers, shows that for most comprehension tasks, the developers perform much better using the 3D representation, especially in terms of velocity and recollection, while also displaying reduced cognitive load and better engagement.
Smart home systems are becoming increasingly popular. Engineering such systems hence becomes a prominent software engineering challenge.In this specific engineering paradigm,users are often interested in considering sensor states while they are performing various activities. Existing works have proposed initial efforts on incremental development method with activity-oriented requirements. However, there is no systematic way of ensuring reliability and security of such systems which may be developed incrementally by various developers and may execute in a complex environment. Some properties, especially those including metric timing constraints, need to be satisfied. In this paper, we introduce ACTOM, a framework for identification of activity-oriented requirements and runtime verification.ACTOM supports the development of the mapping between activities and required sensor readings (activity-sensor mapping) based on the physical requirements of activities. At runtime, ACTOM receives results of activity recognition and is able to trigger actuators to provide the required physical conditions for the activities, as determined by the activity-sensor mapping. Moreover, ACTOM continuously monitors whether activity-sensor mapping holds over a time period during the activity. We plan to evaluate ACTOM with a case study in a smart home in which various activities are observed to show its effectiveness. The end product will be a systematic framework to facilitate the development of activity-oriented requirements and monitor properties related to metric timing constraints to improve reliability and security.
The advances in machine learning(ML) have stimulated the integration of their capabilities into software systems and services. However, there is a tangible gap between software engineering and machine learning practices, that is delaying the progress of intelligent services development. Software organisations are devoting effort to adjust the software engineering processes and practices to facilitate the integration of machine learning models. Machine learning researchers as well are focusing on improving the interpretability of machine learning models to support overall system robustness. Our research focuses on bridging this gap through a methodology that evaluates the robustness of machine learning-enabled software engineering systems. In particular, this methodology will automate the evaluation of the robustness properties of software systems against dataset shift problems in ML. It will also feature a notification mechanism that facilitates the debugging of ML components.
Automated test generators, such as search based software testing (SBST) techniques, replace the tedious and expensive task of manually writing test cases. SBST techniques are effective at generating tests with high code coverage. However, is high code coverage sufficient to maximise the number of bugs found? We argue that SBST needs to be focused to search for test cases in defective areas rather in non-defective areas of the code in order to maximise the likelihood of discovering the bugs. Defect prediction algorithms give useful information about the bug-prone areas in software. Therefore, we formulate the objective of this thesis: \textit{Improve the bug detection capability of SBST by incorporating defect prediction information}. To achieve this, we devise two research objectives, i.e., 1) Develop a novel approach (SBST${CL}$) that allocates time budget to classes based on the likelihood of classes being defective, and 2) Develop a novel strategy (SBST${ML}$) to guide the underlying search algorithm (i.e., genetic algorithm) towards the defective areas in a class. Through empirical evaluation on 434 real reported bugs in the Defects4J dataset, we demonstrate that our novel approach, SBST$_{CL}$, is significantly more efficient than the state of the art SBST when they are given a tight time budget in a resource constrained environment.
Programs are becoming increasingly complex and typically contain an abundance of unneeded features, which could severely harm performance and security. Recently, we have witnessed a surge of debloating techniques that aim to create a reduced version of a program by eliminating the unneeded features therein. To debloat a program, most existing techniques require a usage profile of the program, typically provided as a set of inputs $I$. Unfortunately, these techniques tend to generate a reduced program that is overfitted to $I$ and thus fails to behave correctly for other inputs. To address this limitation of existing techniques, we propose DomGad, which has two main advantages over existing debloating approaches. First, it produces a reduced program that is guaranteed to work for entire subdomains, rather than for specific inputs. Second, it uses stochastic optimization to generate reduced programs that achieve a close-to-optimal trade-off between size reduction and generality (i.e., extent to which the reduced program is able to correctly handle inputs in its whole domain). To assess the effectiveness of DomGad, we applied our approach to a benchmark of ten Unix utility programs. Our results are promising, as they show that DomGad could produce debloated programs that achieve, on average, a 50% code reduction and 95% generality. Our results also show that DomGad performs well when compared with two state-of-the-art debloating approaches.
The research community has long recognized a complex interrelationship between test set size, test adequacy criteria, and test effectiveness in terms of fault detection. However, there is substantial confusion about the role and importance of controlling for test set size when assessing and comparing test adequacy criteria. This paper makes the following contributions: (1) A review of contradictory analyses of the relationship between fault detection, test set size, and test adequacy criteria. Specifically, this paper addresses the supposed contradiction of prior work and explains why test set size is neither a confounding variable, as previously suggested, nor an independent variable that should be experimentally manipulated. (2) An explication and discussion of the experimental design and sampling strategies of prior work, together with a discussion of conceptual and statistical problems, and specific guidelines for future work. (3) A methodology for comparing test-adequacy criteria on an equal basis, which accounts for test set size by treating it as a covariate. (4) An empirical evaluation that compares the effectiveness of coverage-based and mutation-based testing to one another and random testing. Additionally, this paper proposes probabilistic coupling, a methodology for approximating the representativeness of a set of test goals for a given set of real faults.
WebAssembly is a new programming language built for better performance in web applications. It defines a binary code format and a text representation for the code. At first glance, WebAssembly files are not easily understandable to human readers, regardless of the experience level. As a result, distributed third-party WebAssembly modules need to be implicitly trusted by developers as verifying the functionality requires significant effort. To this end, we develop an automated classification tool WASim for identifying the purpose of WebAssembly programs by analyzing features at the module-level. It assigns purpose labels to a module in order to assist developers in understanding the binary module. The code for WASim is available at https://github.com/WASimilarity/WASim and a video demo is available at https://youtu.be/usfYFIeTy0U.
In this paper, we present Sosed, a tool for discovering similar software projects. We use fastText to compute the embeddings of sub-tokens into a dense space for 120,000 GitHub repositories in 200 languages. Then, we cluster embeddings to identify groups of semantically similar sub-tokens that reflect topics in source code. We use a dataset of 9 million GitHub projects as a reference search base. To identify similar projects, we compare the distributions of clusters among their sub-tokens. The tool receives an arbitrary project as input, extracts sub-tokens in 16 most popular programming languages, computes cluster distribution, and finds projects with the closest distribution in the search base. We labeled sub-token clusters with short descriptions to enable Sosed to produce interpretable output.
Sosed is available at https://github.com/JetBrains-Research/sosed/. The tool demo is available at https://www.youtube.com/watch?v=LYLkztCGRt8. The multi-language extractor of sub-tokens is available separately at https://github.com/JetBrains-Research/identifiers-extractor/.
Automated test case generation tools have been successfully pro- posed to reduce the amount of human and infrastructure resources required to write and run test cases. However, recent studies demonstrate that the readability of generated tests is very limited due to (i) uninformative identifiers and (ii) lack of proper documentation. Prior studies proposed techniques to improve test readability by either generating natural language summaries or meaningful methods names. While these approaches are shown to improve test readability, they are also affected by two limitations: (1) generated summaries are often perceived as too verbose and redundant by developers, and (2) readable tests require both proper method names but also meaningful identifiers (within-method readability). In this work, we combine template based methods and Deep Learning (DL) approaches to automatically generate test case scenarios (elicited from natural language patterns of test case statements) as well as to train DL models on path-based representations of source code to generate meaningful identifier names. Our ap- proach, called DeepTC-Enhancer , recommends documentation and identifier names with the ultimate goal of enhancing readability of automatically generated test cases. An empirical evaluation with 36 external and internal developers shows that (1) DeepTC-Enhancer outperforms significantly the baseline approach for generating summaries and performs equally with the baseline approach for test case renaming, (2) the transformation proposed by DeepTC-Enhancer result in a significant increase in readability of automatically generated test cases, and (3) there is a significant difference in the feature preferences between external and internal developers.
Specification mining, in general, and inferring behavior model of a running system, in particular, are quite useful for several automated software engineering tasks, such as program comprehension, anomaly detection, and testing. Most existing dynamic model inference techniques are white-box, i.e., they require source code to be instrumented to get run-time traces. However, in many systems, access to source code is not possible for parts of the program that use third-party binaries and off-the-shelf-components. One useful scenario for automated black-box behaviour inference is in software control units (such as autopilots), where the software system’s reactions over time changes based on the inputs. Run-time state models of such systems are very powerful means for anomaly detection and debugging. Unfortunately, most black-box techniques that detect state changes over time are either uni-variate (which is limiting the application in real-world systems) or are weak with respect to learning from past behaviour. Therefore, in this paper, we propose a hybrid deep neural network that accepts as input a set of time series, one per input signal of the system, and applies a set of convolution and recurrent layers to both learn the non-linear correlations between signals and the patterns over time. We have applied our approach to a real UAV auto-pilot solution from our industry partner with half a million lines of C code. We ran 888 random recent test cases of the system and inferred states over time. We compared our results with several traditional time series change point detection techniques and showed that our approach can improve their performance 88% to 102%, in terms of finding state change points, measured by F1 score. We also showed that our state classification algorithm provides on average 90.45% F1 score, which improves traditional classification algorithms 7% to 17%.
This paper investigates the problem of classifying Android applications into malicious and benign. We analyze the performance of a popular malware detection tool, Drebin, on malware datasets commonly used in an academic setup and show that the high detection accuracy often stems from learning benign rather than malicious indicators. That, effectively, turns the malware detection tools into benign app detectors. Yet, in practice, malware samples are often larger and can exhibit many behaviors similar to those of benign apps. Under such a challenging setup, looking for benign indicators becomes ineffective and the ability of the tools to detect malware degrades substantially.
In this paper, we propose an approach for identifying malicious portions of an app in the presence of numerous benign features, effectively eliminating “noise” and focusing the detection on truly malicious indicators.We also propose a novel metric estimating the “reasons” for correct malware classification, i.e., whether it is based on the presence of malicious indicators or the absence of benign ones. We show that our proposed approach is effective in both increasing the “standard” classification accuracy and in making more “justifiable” classification decisions.
Given that creating an in-house test lab is expensive and time-consuming to maintain, companies and app developers often use device clouds to test their apps. To ensure that this service is functional, cloud organizations need to use management software to control test executions and perform frequent system updates to patch possible issues with the devices. In this paper, we present a preliminary study investigating issues and highlighting research opportunities for managing and maintaining device clouds. In the study, we analyzed more than 12 million test executions on 110 devices. We found that the management software of the cloud infrastructure we considered affected some test executions, and almost all the cloud devices had at least one security-related issue.
Whenever software components process personal or private data, appropriate data protection mechanisms are mandatory. An essential factor in achieving trust and transparency is not to give preference to a single party but to make it possible to audit the data usage in an unbiased way. The scenario in mind for this contribution contains (i) users bringing in sensitive data they want to be safe, (ii) service developers building software-based services whose Intellectual Properties (IPs) they desire to protect, and (iii) platform providers wanting to be trusted and to be able to rely on the component developers integrity. The authors see these interests as an insufficiently solved field of tension that can be relaxed by a suitable level of transparently represented software components to give insights without exposing every detail.
The design and development of production-grade microservice backends is a tedious and error-prone task. In particular, they must be capable of handling all Functional Requirements (FRs) and all Non-Functional Requirements (NFRs) (like security) including all operational requirements (like monitoring). This becomes even more difficult if there are many clients with different roles, linked to diverse (non-)functional requirements and many existing services are involved, which have to consider these in a consistent way. In this paper, we present a model-driven approach that automatically generates client-specific production-grade backends by incorporating previously expressed architectural knowledge out of an interpretable specification of the targeted APIs and the NFRs.
Formal specifications in \textsf{Alloy} are organized around user-defined data domains, associated with \emph{signatures}, with almost no support for built-in datatypes. This minimality in the built-in datatypes provided by the language is one of its main features, as it contributes to the automated analyzability of models. One of the few built-in datatypes available in Alloy specifications are integers, whose SAT-based treatment allows only for small bit-widths. In many contexts, where relational datatypes dominate, the use of integers may be auxiliary, e.g., in the use of cardinality constraints and other features. However, as the applications of \textsf{Alloy} are increased, e.g., with the use of the language and its tool support as backend engine for different analysis tasks, the provision of efficient support for numerical datatypes becomes a need. In this work, we present our current preliminary approach to providing an efficient, scalable and user-friendly extension to \textsf{Alloy}, with arithmetic support for numerical datatypes. Our implementation allows for arithmetic with varying precisions, and is implemented via standard \textsf{Alloy} constructions, thus resorting to SAT solving for resolving arithmetic constraints in models.
Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a \emph{specification} of the intended behavior of the software under analysis. Unfortunately, software many times lacks such specifications. This issue seriously diminishes the analyzability of software with respect to its reliability. Thus, finding novel techniques to capture the intended software behavior in the form of specifications would allow us to exploit them for automated reliability analysis.
Our research focuses on the application of learning techniques to automatically distinguish correct from incorrect software behavior. The aim here is to decrease the developer’s effort in specifying oracles, and instead \emph{generating} them from actual software behaviors.
The ability in rapidly learning and adapting to evolving user needs is key to modern business success. Existing methods are based on text mining and machine learning techniques to analyze user comments and feedback, and often constrained by heavy reliance on manually codified rules or insufficient training data. Multitask learning (MTL) is an effective approach with many successful applications, with the potential to address these limitations associated with requirements analysis tasks. In this paper, we propose a deep MTL-based approach, DEMAR, to address these limitations when discovering feature requests from massive issue reports and annotating the sentences in support of automated requirements analysis. DEMAR consists of three main phases: 1) data augmentation phase, for data preparation and allowing data sharing beyond single-task learning; 2) model construction phase, for constructing the MTL-based model for requirements discovery and requirements annotation tasks; and 3) model training phase, enabling eavesdropping by shared loss function between the two related tasks. Evaluation results from eight open-source projects show that, the proposed multitask learning approach outperforms two state-of-the-art approaches (FRA and CNC) and six common machine learning algorithms across both requirements discovery and requirements annotation tasks, i.e., with a precision of 91% and a recall of 83% for requirements discovery task, and overall accuracy of 83% for requirements annotation task. The proposed approach provides a novel and effective way to jointly learn two related requirements analysis tasks. We believe that it also sheds light on further directions in exploring the application of multitask learning in solving other related software engineering problems.
Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted templates or information retrieval (IR) techniques to generate summaries for source code. In recent years, neural network-based methods which leveraged acclaimed encoder-decoder deep learning framework to learn comment generation patterns from a large-scale parallel code corpus, have achieved impressive results. However, these emerging methods only take code-related information as input. Software reuse is common in the process of software development, meaning that comments of similar code snippets are helpful for comment generation. Inspired by the IR-based and template-based approaches, in this paper, we propose a neural comment generation approach where we use the existing comments of similar code snippets as exemplars to guide comment generation. Specifically, given a piece of code, we first use an IR technique to retrieve a similar code snippet and treat its comment as an exemplar. Then we design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input, and leverages the information from the exemplar to assist in the target comment generation based on the semantic similarity between the source code and the similar code. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that our model outperforms the state-of-the-art methods by a substantial margin.
Developers write logging statements to generate logs and record system execution behaviors to assist in debugging and software maintenance. However, deciding where to insert logging statements is a crucial yet challenging task. On one hand, logging too little may increase the maintenance difficulty due to missing important system execution information. On the other hand, logging too much may introduce excessive logs that mask the real problems and cause significant performance overhead. Prior studies provide recommendations on logging locations, but such recommendations are only for limited situations (e.g., exception logging) or at a coarse-grained level (e.g., method level). Thus, properly helping developers decide finer-grained logging locations for different situations remains an unsolved challenge. In this paper, we tackle the challenge by first conducting a comprehensive manual study on the characteristics of logging locations in seven open-source systems. We uncover six categories of logging locations and find that developers usually insert logging statements to record execution information in various types of code blocks. Based on the observed patterns, we then propose a deep learning framework to automatically suggest logging locations at the block level. We model the source code at the code block level using the syntactic and semantic information. We find that: 1) our models achieve an average of 85.6% balanced accuracy when suggesting logging locations in blocks; 2) our cross-system logging suggestion results reveal that there might be an implicit logging guideline across systems. Our results show that we may accurately provide finer-grained suggestions on logging locations, and such suggestions may be shared across systems.
API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a misuse. However, there is a big knowledge gap between API usage patterns and API usage caveats in terms of comprehensiveness, explainability and best practices. In this work, we propose a novel approach that detects API misuses directly against the API caveat knowledge, rather than API usage patterns. We develop open information extraction methods to construct a novel API-constraint knowledge graph from API reference documentation. This knowledge graph explicitly models two types of API-constraint relations (call-order and condition-checking) and enriches return and throw relations with return conditions and exception triggers. It empowers the detection of three types of frequent API misuses - missing calls, missing condition checking and missing exception handling, while existing detectors mostly focus on only missing calls. As a proof-of-concept, we apply our approach to Java SDK API Specification. Our evaluation confirms the high accuracy of the extracted API-constraint relations. Our knowledge-driven API misuse detector achieves 0.60 (68/113) precision and 0.28 (68/239) recall for detecting Java API misuses in the API misuse benchmark MuBench. This performance is significantly higher than that of existing pattern-based API misused detectors. A pilot user study with 12 developers shows that our knowledge-driven API misuse detection is very promising in helping developers avoid API misuses and debug the bugs caused by API misuses.
Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies have shown that statistical language modeling techniques can improve the performance of code completion tools through learning from large-scale software repositories. However, these models suffer from two major drawbacks: a) Existing research uses static embeddings, which map a word to the same vector regardless of its context. The differences in the meaning of a token in varying contexts are lost when each token is associated with a single representation; b) Existing LM-based code completion models perform poor on completing identifiers, and the type information of the identifiers is ignored in most of these models. To address these challenges, in this paper, we develop a multi-task learning based pre-trained language model for code understanding and code generation with a Transformer-based neural architecture. We pre-train it with hybrid objective functions that incorporate both code understanding and code generation tasks. Then we fine-tune the pre-trained model on code completion. During the completion, our model does not directly predict the next token. Instead, we adopt multi-task learning to predict the token and its type jointly and utilize the predicted type to assist the token prediction. Experiments results on two real-world datasets demonstrate the effectiveness of our model when compared with state-of-the-art methods.
More and more new technologies are used in test development. Among them, automatic test generation, a promising technology to improve the efficiency of unit testing, currently performs not satisfactory in practice. Test recommendation, like code recommendation, is another feasible technology for supporting efficient unit testing and gets more and more attention. In this paper, we develop a novel system, namely HomoTR, which implements online test recommendations by measuring the homology of two methods. If the new method under test shares homology with an existing method that has tests, HomoTR would recommend the tests to the new method. The preliminary experiments show that HomoTR can quickly and effectively recommend test cases to help the testers improve the testing efficiency. Besides, HomoTR has been integrated into the MoocTest platform successfully, so it can also execute the recommended tests automatically and visualize the testing results (e.g., Branch Coverage) friendly to help testers understand the process of testing. The demo video of HomoTR can be found at {\color{blue}\url{https://youtu.be/_227EfcUbus}}.
We describe an online approach to SMT solver selection using nearest neighbor classification and runtime estimation. We implement and evaluate our approach with MedleySolver, finding that it makes nearly optimal selections and evaluates a dataset of queries three times faster than any indivdual solver.
The use of data-hungry deep learning algorithms to augment the performance of cloud-deployed software services calls for the constant relaying of private user data over a network that is susceptible to attack from malicious agents and also limits the extent to which data can be collated in a central repository to train centralized machine learning models. To encourage anonymized and decentralized training of models in such scenarios, We propose CrossPriv, a user-privacy sensitive model that enlists the characteristics of cross-silo federated software deployed across the clients participating in the cross-silo FL learning setup. We simulate the efficacy of the model by demonstrating the training of a deep learning model that can detect Pneumonia using X-Rays using training data hosted at two completely different silos, without sharing their raw data. We specify the client and server-side features of the deployed service whilst clearly defining the pipeline of cross-silo federated learning architecture.
Mosts of the neural synthesis construct encoder-decoder models to learn a probability distribution over the space of programs. Two drawbacks in such neural program synthesis are that the synthesis scale is relatively small and the correctness of the synthesis result cannot be guaranteed.We address these problems by constructing a framework, which analyzes and solves problems from three dimensions: program space description, model architecture, and result processing. Experiments show that the scalability and precision of synthesis are improved in every dimension.
Second place SRC - Undergraduate
Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.
Third place SRC - Graduate
This paper presents a static vulnerability detection and patching framework at both source code and binary level. It collects source code known vulnerability information and automatically identifies and extracts binary vulnerability information. Using the known vulnerabilities, it matches similar vulnerable functions and filters out the ones that have been patched in the target program. For the vulnerable functions, the framework tries to generate hot patches by learning from the source code.
With the influx of Web 3.0 the focus in Big Data Analytics has shifted towards modelling highly interconnected data and analysing relationships between them. Graph databases befit the requirements of Big Data Analytics yet organizations still depend on relational databases. A major roadblock in the industry wide adoption of graph databases is that a standard query language is still in its inception stage hence withholding interoperability between the two technologies. In this research we propose a tool FLUX for translating relational database queries to graph database queries.
Winner SRC - Graduate
GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on difference devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot.Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues. We develop a heuristics-based data augmentation method and a GAN-based data augmentation method for boosting the performance of our OwlEye. At present, the evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues.
Hardening is the process of configuring IT systems to ensure the security of the systems’ components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task.
In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations.
In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined.
We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides.
We address the problem of identifying performance changes in the evolution of configurable software systems. Finding optimal configurations and configuration options that influence performance is already difficult, but in light of software evolution, configuration-dependent performance changes may lurk in a potentially large number of different versions of the system.
In this work, we combine two perspectives – variability and time – and propose a novel approach to identify configuration-dependent performance changes. In a nutshell, we iteratively sample pairs of configurations and versions and measure the respective performance that help us update a model of likelihoods for performance changes. Pursuing a search strategy with the goal of measuring selectively and incrementally further pairs, we increase the accuracy of identified change points related to configuration options and interactions.
We have conducted a number of experiments both on controlled synthetic datasets as well as in real-world scenarios with different software systems. Our evaluation demonstrates that we can pinpoint performance shifts to configuration options and interactions as well as commits introducing change points with high accuracy and at scale. Our experiments on three real-world systems confirm the effectiveness and practicality of our approach.
Performance bugs are often hard to detect due to their non fail-stop symptoms. Existing debugging techniques can only detect performance bugs with known patterns (e.g., inefficient loops). The key reason behind this incapability is the lack of a general test oracle. Here, we argue that the performance expectation of configuration can serve as a strong oracle candidate for performance bug detection. First, prior work shows that most performance bugs are related to configurations. Second, the configuration change reflects users’ expectation on performance changes. If the actual performance behaves differently from the users’ expectation, the related code snippet is likely to be problematic.
In this paper, we first conducted a comprehensive study on 173 real-world configuration-related performance bugs (CPBugs) from 12 representative software systems. We then derived seven configuration-related performance properties, which can serve as test oracles in performance testing. Guided by the study, we designed and evaluated an automated performance testing framework, CP-Detector, for detecting real-world configuration-related performance bugs. CP-Detector was evaluated on 12 open-source projects. The results showed that it detected 43 out of 61 existing bugs and reported 13 new bugs.
Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, many identified bugs and vulnerabilities in smart contracts have led to serious financial losses, which raises serious concerns about smart contract security. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this research: (1) Firstly, we propose an automated deep learning based approach to learn structural code embeddings of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. We apply our approach to more than 22K solidity contracts collected from the Ethereum blockchain, results show that the clone ratio of solidity code is at around 90%, much higher than traditional software. % Our work reveals homogeneous of the Ethereum ecosystem. We collect a list of 52 known buggy smart contracts belonging to 10 kinds of common vulnerabilities as our bug database. Our approach can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. (2) Secondly, according to developers’ feedback, we have implemented the approach in a web-based tool, named SmartEmbed, to facilitate Solidity developers for using our approach. Our tool can assist Solidity developers to efficiently identify repetitive smart contracts in the existing Ethereum blockchain, as well as checking their contract against a known set of bugs, which can help to improve the users’ confidence in the reliability of the contract. We optimize the implementations of SmartEmbed which is sufficient in supporting developers in real-time for practical uses. The Ethereum ecosystem as well as the individual Solidity developer can both benefit from our research. SmartEmbed website: http://www.smartembed.tools Demo video: https://youtu.be/o9ylyOpYFq8 Replication package: https://github.com/beyondacm/SmartEmbed
Cryptographic algorithms are widely used to protect data privacy in many aspects of daily lives from smart card to cyber-physical systems. Unfortunately, programs implementing cryptographic algorithms may be vulnerable to practical power side-channel attacks, which may infer private data via statistical analysis of the correlation between power consumptions of an electronic device and private data. To thwart these attacks, several masking schemes have been proposed. However, programs that rely on secure masking schemes are not secure a priori. Although some techniques have been proposed for formally verifying masking countermeasures and for quantifying masking strength, they are currently limited to Boolean programs and suffer from low accuracy. In this work, we propose an approach for formally verifying masking countermeasures of arithmetic programs. Our approach is more accurate for arithmetic programs and more scalable for Boolean programs comparing to the existing approaches. We have implemented our methods in a verification tool QMVerif which has been extensively evaluated on cryptographic benchmarks including full AES, DES and MAC-Keccak. The experimental results demonstrate the effectiveness and efficiency of our approach, especially for compositional reasoning.
Smart contracts are Turing-complete programs running on the blockchain. They cannot be modified, even when bugs are detected. The Selfdestruct function is the only way to destroy a contract on the blockchain system and transfer all the Ethers on the contract balance. Thus, many developers use this function to destroy a contract and redeploy a new one when bugs are detected. In this paper, we propose a deep learning-based method to find security issues of Ethereum smart contracts by finding the updated version of a destructed contract. After finding the updated versions, we use open card sorting to find security issues.
Program semantics learning is a vital problem in various AI for SE applications i.g., clone detection, code summarization. Learning to represent programs with Graph Neural Networks (GNNs) has achieved state-of-the-art performance in many applications i.g, vulnerability identification, type inference. However, currently, there is a lack of a unified framework with GNNs for distinct applications. Furthermore, most existing GNN-based approaches ignore global relations with nodes, limiting the model to learn rich semantics. In this paper, we propose a unified framework to construct two types of graphs to capture rich code semantics for various SE applications.
The data race problem is common in the interrupt-driven program, and it is difficult to find as a result of complicated interrupt interleaving. Static analysis is a mainstream technology to detect those problems, however, the synchronization mechanism of interrupt is hard to be processed by the existing method, which brings many false alarms. Eliminating false alarms in static analysis is the main challenge for precisely data race detection. In this paper, we present a framework of static analysis combined with program verification, which performs static analysis to find all potential races, and then verifies every race to eliminate false alarms. The experiment results on related race benchmarks show that our implementation finds all race bugs in the phase of static analysis, and eliminates all false alarms through program verification.
Winner SRC - Undergraduate
Prior study has identified common anti-patterns in automated repair for C programs. In this work, we study if the same problems exist in Java programs. We performed a manual inspection on the plausible patches generated by Java automated repair tools. We integrated anti-patterns in jGenProg2 and evaluated on Defects4J benchmark. The result shows that the average repair time is reduced by 22.6% and the number of generated plausible patches is reduced from 67 to 29 for 14 bugs in total. Our study provided evidence about the effectiveness of applying anti-patterns in future Java automated repair tools.
Second place SRC - Graduate
This article presents a description of a system for the automatic generation of predictive diagnostic models of CNC machine tools. This system allows machine tool maintenance specialists to select and operate models based on LSTM neural networks to determine the state of elements of CNC machines. Examples of changes in the accuracy of the models used during operation are given to determine the state of the cutting tool (more than 95%) and the bearings of electric motors (more than 91%).
Third place SRC - Undergraduate
One recent promising direction in reducing costs of mutation analysis is to identify redundant mutations. We propose a technique to discover redundant mutations by proving subsumption relations among method-level mutation operators using weak mutation testing. We conceive and encode a theory of subsumption relations in Z3 for 40 mutation targets (mutations of an expression or statement). Then we prove a number of subsumption relations using the Z3 theorem prover, and reduce the number of mutations in a number of mutation targets. MuJava-M includes some subsumption relations in MuJava. We apply MuJava and MuJava-M to 187 classes of 17 projects. Our approach correctly discards mutations in 74.97% of the cases, and reduces the number of mutations by 72.52%.
When building enterprise applications on Java frameworks (e.g., Spring), developers often specify components and configure operations with a special kind of XML files named “deployment descriptors (DD)”. Maintaining such XML files is challenging and time-consuming; because (1) the correct configuration semantics is domain-specific but usually vaguely documented, and (2) existing compilers and program analysis tools rarely examine XML files. To help developers ensure the quality of DD, this paper presents a novel approach—Xeditor—that extracts configuration couplings (i.e., frequently co-occurring configurations) from DD, and adopts the coupling rules to validate new or updated files.
Xeditor has two phases: coupling extraction and bug detection. To identify couplings, Xeditor first mines DD in open-source projects, and extracts XML entity pairs that (i) frequently co-exist in the same files and (ii) hold the same data at least once. Xeditor then conducts customized association rule mining based on the extracted pairs. For bug detection, given a program commit, Xeditor checks whether any new or updated XML file violates the identified couplings; if so, Xeditor reports the violation(s).
We applied Xeditor to 4,248 DD, which were mined from 1,137 GitHub projects. Based on our manual inspection, Xeditor extracted couplings with high precision (73%). Furthermore, we built a ground truth data set based on the manual inspection results, and conducted 10-fold cross validation to evaluate Xeditor ’s effectiveness of bug detection. On average, Xeditor detected bugs with 92% precision, 96% recall, and 94% accuracy. Finally, we applied Xeditor to the version history of another 478 GitHub projects. Xeditor identified 26 really erroneous XML updates, 15 of which match the program changes later committed by developers.
Understanding the influence of configuration options on performance is key for finding an optimal system configuration, system understanding, and performance debugging. In prior research, a number of performance influence modeling approaches have been proposed, which all assign scalar values to option influences and model predictions. However, these point estimates falsely imply a certainty regarding an option’s influence that neglects several sources of uncertainty within the assessment process, such as (1) measurement bias, (2) model representation and learning process, and (3) incomplete data. This leads to the situation that different approaches assign different scalar performance values to options and interactions among them. The true influence is uncertain, though, there is no way to even quantify this uncertainty with state-of-the-art performance modeling approaches.
We propose a novel approach based on probabilistic programming that explicitly models uncertainty for option influences and consequently provides a confidence interval for each prediction alongside a scalar. This way, we can explain, for the first time, why predictions may cause errors and which option’s influences may be unreliable. Our evaluation on 10 real-world subject systems shows that with our implementation, P4, we yield errors that match the state of the art when considering only the scalar component of the prediction, and we achieve competitive accuracies while providing reliable confidence intervals.
In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted and given up by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption?
In this paper, we employ a large-scale, mixed-method empirical study in order to characterize the team discussions and to discern the team-level determinants of tool adoption into an OSS projects’ development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members are negatively associated. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams.
The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep-learning methods to this task—specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CodeNN, DeepCom, FunCom, and DocString. We compare them with WMT19, a standard dataset frequently used to train state-of-the-art natural language translators. We found some interesting differences between the code-comment data and the WMT19 natural language data. Next, we describe and conduct some studies to calibrate BLEU (which is commonly used as a measure of comment quality). using "affinity pairs" of methods, from different projects, in the same project, in the same class, etc; Our study suggests that the current performance on some datasets might need to be improved substantially. We also argue that fairly naive information retrieval (IR) methods do well enough at this task to be considered a reasonable baseline. Finally, we make some suggestions on how our findings might be used in future research in this area.
Software performance is critical to the quality of the software system. Performance bugs can cause significant performance degradation such as long response time and low system throughput that ultimately lead to poor user experiences. Many modern software projects use bug tracking systems that allow developers and users to report issues they have identified in the software. While bug reports are intended to help developers to understand and fix bugs, they are also extensively used by researchers for finding benchmarks to evaluate their testing and debugging approaches. Researchers often rely on the description of a confirmed performance bug report to reproduce the performance bug to be used in their evaluation. Although researchers spend a considerable amount of time and effort in finding usable performance bugs from bug repositories, they often get only a few usable performance bugs. Reproducing performance bugs is a difficult task even for domain experts such as developers. Compared to functional bugs, performance bugs are substantially more complicated to reproduce because they often manifest through large inputs and specific execution conditions. The amount of information disclosed in a bug report may not always be sufficient to reproduce the performance bug for researchers, and thus hinders the usability of bug repository as the resource for finding benchmarks. Our study targets reproducing performance bugs from the perspectives of non-domain experts such as software engineering researchers. One big difference compared to the prior work is that we specifically target confirmed performance bugs to report why software engineering researchers may not succeed in reproducing such bugs rather than understanding and characterizing non- reproducible bugs from the viewpoints of developers. Therefore, a failed-to-reproduce performance bug in this work is defined as a developer confirmed reproducible performance bug that cannot be reproduced by researchers due to the lack of domain knowledge or environment limitations. The goal of this study is to share our experience as software engineering researchers in reproducing performance bugs through investigating the impact of different factors identified in confirmed performance bug reports in open-source projects. We studied the characteristics of confirmed performance bugs by reproducing them using only information available from the bug report to examine the challenges of performance bug reproduction. We spent more than 800 hours over the course of six months to study and reproduce 93 confirmed performance bugs, which are randomly sampled from two large-scale open-source server applications. We 1) studied the characteristics of the reproduced performance bug reports; 2) summarized the causes of failed-to-reproduce confirmed performance bug reports; 3) shared our experience on suggesting workarounds to improve the bug reproduction success rate; 4) delivered a virtual machine image that contains a set of 17 ready-to-execute performance bug benchmarks. The findings of our study provide guidance and a set of suggestions to help researchers to understand, evaluate, and successfully reproduce performance bugs. We also provided a set of implications for both researchers and practitioners on developing techniques for testing and diagnosing performance bugs, improving the quality of bug reports, and detecting failed-to-reproduce bug reports.
Link to Publication: https://www.sciencedirect.com/science/article/pii/S0164121219301438
Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these \emph{possible dependencies}, in contrast with the \emph{explicit dependencies} that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, \emph{to what extent will these missing possible dependencies impact the architecture analysis?} To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial—higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%$\sim$14.18%), recall(31.73%$\sim$39.12%), and F1 scores (22.13%$\sim$32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; % Accordingly, possible dependencies dramatically change the file and dependency sets within 23.11% and 26.39% of architectural sub-spaces respectively; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence strongly suggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing. %Our findings benefit architecture analysis and coding practice for software developed by dynamic languages like Python.
Ask Me Anything
Please see my website
Source code clone detection is to excavate code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed for detecting code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality of the same token in different basic blocks. By this a graph is turned into certain tokens with graph semantics (i.e., centrality), called semantic tokens. In final, these semantic tokens are fed into a Siamese architecture neural network to train a model, and uses it to detect code clones. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to state-of-the-art methods and the time cost of SCDetector is more than 14 times less than the state-of-the-art approach in detecting semantic clones.
Developers are concerned with the comparison of similar APIs in terms of their commonalities and (often subtle) differences. Our empirical study of Stack Overflow questions and API documentation confirms that API comparison questions can often be answered by knowledge contained in API reference documentation. Our study also identifies eight types of API statements that are useful for API comparison. Based on these findings, we propose a knowledge graph based approach APIComp that automatically extracts API knowledge from API reference documentation to support the comparison of a pair of API classes or methods from different aspects. Our approach includes an offline phase for constructing an API knowledge graph, and an online phase for generating an API comparison result for a given pair of API elements. Our evaluation shows that the quality of different kinds of extracted knowledge in the API knowledge graph is generally high. Furthermore, the comparison results generated by APIComp are significantly better than those generated by a baseline approach based on heuristic rules and text similarity, and our generated API comparison results are useful for helping developers in API selection tasks.
Just-In-Time (JIT) defect prediction is a classification model that is trained using historical data to predict bug-introducing changes. However, recent studies raised concerns related to the explainability of the predictions of many software analytics applications (i.e., practitioners do not understand why commits are risky and how to improve them). In addition, the adoption of Just-In-Time defect prediction is still limited due to a lack of integration into CI/CD pipelines and modern software development platforms (e.g., GitHub). In this paper, we present an explainable Just-In-Time defect prediction framework to automatically generate feedback to developers by providing the riskiness of each commit, explaining why such commit is risky, and suggesting risk mitigation plans. The proposed framework is integrated into the GitHub CI/CD pipeline as a GitHub application to continuously monitor and analyse a stream of commits in many GitHub repositories. Finally, we discuss the usage scenarios and their implications to practitioners. The VDO demonstration is available at https://youtu.be/HJBzULrS6hE.
The development of efficient data science applications is often impeded by unbearably long execution time and rapid RAM exhaustion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative results reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further suggest that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information found in performance-related discussions. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development.
Automated debugging techniques, including fault localization and program repair, have been studied for decades. The existing connection between fault localization and program repair is that fault localization computes the potential buggy elements for program repair to patch. Recently, a pioneering work, ProFL, explored the unified debugging idea to unify fault localization and program repair in the other direction for the first time to boost both areas. More specifically, ProFL utilizes the patch execution results from one state-of-the-art repair system, PraPR, to help improve state-of-the-art fault localization. In this way, ProFL not only improves fault localization for manual repair, but also extends the application scope of automated repair to all possible bugs (not only the small ratio of bugs that can be automatically fixed). In this work, we perform an extensive study of the ProFL unified-debugging approach on 16 state-of-the-art program repair systems for the first time. Our experimental results on the widely studied Defects4J benchmark suite reveal various practical guidelines for unified de-bugging, such as : (1) nearly all the studied 16 repair systems can contribute to unified debugging despite of their varying repairing capabilities, (2) repair systems targeting multi-edit patches canbring noise for unified debugging, (3) repair systems with more executed/plausible patches tend to perform better for unified debugging, (4) unified debugging effectiveness does not rely on the availability of correct patches in automated repair. Based on our study outcome, we further propose an advanced unified debugging technique, UniDebug++, which can localize over 20% more bugs within Top-1 positions than state-of-the-art ProFL.
Experience
Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs with different functionalities to facilitate their app development. Unfortunately, the popularity of TPLs also brings new challenges and even threats. TPLs may carry malicious or vulnerable code and can infect many popular apps to pose threats to mobile users. Besides, the code of third-party libraries could constitute noises in some detection tasks. Thus, researchers have developed various tools to identify TPLs. However, no existing work has studied these TPL detection tools in detail; different tools focus on different applications with performance differences, so little is known about them.
To better understand existing TPL detection tools and dissect TPL detection techniques, we conduct an experience paper and attempt to fill the gap by evaluating and comparing all publicly available TPL detection tools based on four criteria: effectiveness, efficiency, code obfuscation-resilience capability, and ease of use. We reveal their advantages and disadvantages based on our empirical study. The result shows that most TPL detection tools can achieve high precision but with low recall. According to our evaluation and survey results, we recommend different tools for different application scenarios. We find that LibRadar is suitable for large-scale in-app TPL detection. LibPecker is ideal for identifying obfuscated TPLs. LibScout can identify specific library versions, which can be leveraged to find vulnerable TPLs, etc. Besides, we enhance these open-sourced tools by fixing their limitations, to improve their detection ability. We also build an extensible framework that integrates all existing available TPL detection tools, providing online service for the research community. We make publicly available the evaluation dataset and enhanced tools. We believe our work provides a clear picture of existing TPL detection techniques and also give a road-map for future directions.
Today, a plethora of different software verification tools exist. When having a concrete verification task at hand, software developers thus face the problem of algorithm selection. Existing algorithm selectors for software verification typically use handpicked program features together with (1) either manually designed selection heuristics or (2) machine learned strategies. While the first approach suffers from not being transferable to other selection problems, the second approach lacks interpretability, i.e., insights into reasons for choosing particular tools.
In this paper, we propose a novel approach to algorithm selection for software verification. Our approach employs representation learning together with an attention mechanism. Representation learning circumvents feature engineering, i.e., avoids the handpicking of program features. Attention permits a form of interpretability of the learned selectors. We have implemented our approach and have experimentally evaluated and compared it with existing approaches. The evaluation shows that representation learning does not only outperform manual feature engineering, but also enables transferability of the learning model to other selection tasks.
In this paper we present lightweight model-based testing of privacy and authorization concepts of national portal for electronic health services in Norway (which has over a million of visits per month). We have developed test models for creating and updating privacy levels and authorization categories using finite state machine notation. Our models emphasize not only positive but also negative behavioral aspects of the system. Using edge and edge-pair coverage as an acceptance criteria we identify and systematically derive abstract test cases (high level user scenario) from the models. Abstract test cases are further refined and transformed into concrete test cases with detailed test steps and concrete test data. Although derivation of abstract test cases and their transformation into concrete test cases are manual, execution of concrete test cases and generation of test report are automated. In total, we extracted 85 abstract test cases which resulted in about 80 concrete test cases with over 550 iterations. Automated execution of all test iterations takes about one hour, while manual test of one iteration takes about five minutes (over 40 times speedup). Model-based testing contributed to shift the focus of our intellectual work effort into model design rather than test case design, thus making derivation of test scenarios systematic and (relatively) straight forward. In addition, applying model-based testing augmented and extended our traditional quality assurance techniques by facilitating better comprehension of new privacy and authorization concepts. Graphical models helped on improved understanding of textual specifications.
When generating GUI tests for Android apps, it typically is a separate test computer that generates interactions, which are then executed on an actual Android device. While this approach is efficient in the sense that apps and interactions execute quickly, the communication overhead between test computer and device slows down testing considerably. In this work, we present DD-2, a test generator for Android that tests other apps on the device using Android accessibility services. In our experiments, DD-2 has shown to be 3.2 times faster than its computer-device counterpart, while sharing the same source code.
Software performance testing is an essential quality assurance mechanism that can identify optimization opportunities. Automating this process requires strong tool support, especially in the case of Continuous Integration (CI) where tests need to run completely automatically and it is desirable to provide developers with actionable feedback. A lack of existing tools means that performance testing is normally left out of the scope of CI. In this paper, we propose a toolchain - PerfCI - to pave the way for developers to easily set up and carry out automated performance testing under CI. Our toolchain is based on allowing users to (1) specify performance testing tasks, (2) analyze unit tests on a variety of python projects ranging from scripts to full-blown flask-based web services, by extending a performance analysis framework (VyPR) and (3) evaluate performance data to get feedback on the code. We demonstrate the feasibility of our toolchain by using it on a web service running at the Compact Muon Solenoid (CMS) experiment at the world’s largest particle physics laboratory — CERN.
Multi-tier distributed systems are systems composed of several distributed nodes organized in layered tiers. Each tier implements a set of conceptually homogeneous functionalities that provide services to the tier above and use services of the tier below, in the layered structure. The distributed computing infrastructure and the connection among the vertical and horizontal structures make multi-tier distributed systems extremely complex and difficult to understand even for their developers. Indeed, runtime failures are becoming the norm rather than the exception in many multi-tier distributed systems [2–4]. Predicting failures at runtime is essential to trigger automatic and operator-driven reactions to either avoid the incoming failures or mitigate their impact on the overall system reliability. Current approaches for predicting failures exploit either anomaly-based or signature-based strategies. Anomaly-based strategies consider behaviors that significantly deviate from the normal system behavior as symptoms of failures that may occur in the near future. Signature-based strategies rely on known patterns of failure-prone behaviors, called signatures, to predict failures that match the pattern. Anomaly-based techniques suffer from false positives, while signature-based techniques cannot cope with emerging failures. In our paper [1], we present PreMiSE (PREdicting failures in Multi-tIer distributed SystEms), a novel approach to accurately predict failures and precisely locate the responsible faults in multi tier distributed systems. PreMiSE combines signature-based with anomaly-based approaches, to reduce the false positive rate of anomaly-based approaches, and improve the accuracy of signature-based approaches. As illustrated in Figure 1, PreMiSE (i) monitors the status of the system by collecting (a large set of) performance indicators that we refer to as Key Performance Indicators (KPIs) (KPI monitoring), (ii) identifies deviations from normal behaviors by pinpointing anomalous KPIs with anomaly-based techniques (Anomaly detection), (iii) identifies incoming failures by identifying symptomatic anomalous KPI sets with signature-based techniques (Signature-based failure prediction). We evaluated PreMiSE on a prototype multi-tier distributed architecture that implements telecommunication services. The experimental data indicate that PreMiSE can predict failures and locate faults with high precision and low false positive rates for some relevant classes of faults, thus confirming our research hypotheses.
As Deep Learning (DL) is continuously adopted in many industrial applications, its quality and reliability start to raise concerns. Similar to the traditional software development process, testing the DL software to uncover its defects at an early stage is an effective way to reduce the risks after deployment. According to the fundamental assumption of deep learning, the DL software does not provide statistical guarantee and has limited capability in handling data that go beyond it’s learned distribution, i.e., out-of-distribution (OOD) data. Recent progress has been made in designing novel testing techniques for DL software, which can detect thousands of errors. However, the current state-of-the-art DL testing techniques do not take the distribution of generated test data into consideration. It is therefore hard to judge whether the “identified errors” are indeed meaningful errors to the DL application (i.e., due to the quality issue of the model) or outliers that cannot be handled by the current model (i.e., due to the lack of training data).
To fill this gap, we take the first step and conduct a large scale empirical study, with a total of 451 experiment configurations, 42 DNN and over 1.2 million test data instances, to investigate and characterize the capability of DL software from data distribution perspective towards understanding its impact on the DL testing techniques. We first perform a large scale empirical study on five state-of-the-art OOD detection techniques to investigate their performance in distinguishing the in-distribution (ID) data and OOD data. Based on the results, we select the best OOD detection technique and investigate the characteristics of the generated test data by different DL testing techniques, i.e., 8 mutation operators and 6 testing criteria. The results demonstrate that some mutation operators and testing criteria tend to guide generating OOD test data, while some show to be the opposite. After identifying the ID and OOD errors, we further investigate their effectiveness in DL model robustness enhancement. The results confirm the importance of data distribution awareness in both testing and enhancement phases outperforming distribution unaware retraining up to 21.5%. As deep learning follows the data-driven development paradigm, whose behavior highly depends on the training data, the results of this paper confirm the importance and calls for the inclusion of data-awareness during designing new testing and analysis techniques for DL software.
Recent advances in deep neural networks (DNNs) have led to object detectors (ODs) that can rapidly process pictures or videos, and recognize the objects that they contain. Despite the promising progress by industrial manufacturers such as Amazon and Google in commercializing deep learning-based ODs as a standard computer vision service, ODs — similar to traditional software — may still produce incorrect results. These errors, in turn, can lead to severe negative outcomes for the users. For instance, an autonomous driving system that fails to detect pedestrians can cause accidents or even fatalities. However, despite their importance, principled, systematic methods for testing ODs do not yet exist.
To fill this critical gap, we introduce the design and realization of MetaOD, a metamorphic testing system specifically designed for ODs to effectively uncover erroneous detection results. To this end, we (1) synthesize natural-looking images by inserting extra object instances into background images, and (2) design metamorphic conditions asserting the equivalence of OD results between the original and synthetic images after excluding the prediction results on the inserted objects. MetaOD is designed as a streamlined workflow that performs object extraction, selection, and insertion. We develop a set of practical techniques to realize an effective workflow, and generate diverse, natural-looking images for testing. Evaluated on four commercial OD services and four pretrained models provided by the TensorFlow API, MetaOD found tens of thousands of detection failures. To further demonstrate the practical usage of MetaOD, we use the synthetic images that cause erroneous detection results to retrain the model. Our results show that the model performance is significantly increased, from an mAP score of 9.3 to an mAP score of 10.5.
Heap-based overflows are still not completely solved even after decades of research. This paper proposes Prober, a novel system aiming to detect and prevent heap overflows in the production environment. Prober leverages a key observation based on the analysis of dozens of real bugs: all heap overflows are related to arrays. Based on this observation, Prober only focuses on array-related heap objects, instead of all heap objects. Prober utilizes static analysis to label all susceptible call-stacks during the compilation, and then employs the page protection to detect any invalid accesses during the runtime. In addition to this, Prober integrates multiple existing methods together to ensure the efficiency of its detection. Overall, Prober introduces almost negligible performance overhead, with 1.5% on average. Prober not only stops possible attacks on time, but also reports the faulty instructions that could guide bug fixes. Prober is ready for deployment due to its effectiveness and low overhead.
Recent advances in web technology have made in-browser cryptomining a viable funding model. However, these services have been abused to launch large-scale cryptojacking attacks to secretly mine cryptocurrency in browsers. To detect them, various signature-based or runtime feature-based methods have been proposed. However, they can be imprecise or easily circumvented. To this end, we propose MinerRay, a generic scheme to detect malicious in-browser cryptominers. Instead of leveraging unreliable external patterns, MinerRay relies on the essence of cryptomining semantics that differentiates mining from common browsing activities. By abstracting away language or implementation details, MinerRay can handle modules written in different languages. Besides, MinerRay infers user contents to check if the mining is started secretly. MinerRay was evaluated on over 1 million websites. It detected cryptominers on 901 websites, where 885 secretly start mining without user consent. Besides, we compared MinerRay with five state-of-the-art signature-based or behavior-based cryptominer detectors (MineSweeper, CM-Tracker, Outguard, No Coin, and minerBlock). We observed that emerging miners with new signatures or new services were detected by MinerRay but missed by others. The result shows that our proposed technique is effective and robust in detecting evolving cryptominers, yielding more true positives, and fewer errors.
This paper presents Solar, a system for automatic synthesis of adversarial contracts that exploit vulnerabilities in a victim smart contract. To make the synthesis tractable, we introduce a query language as well as \emph{summary-based symbolic evaluation}, which significantly reduces the number of instructions that our synthesizer needs to evaluate symbolically, without compromising the precision of the vulnerability query. We encoded common vulnerabilities of smart contracts and evaluated \toolname on the entire data set from \etherscan. Our experiments demonstrate the benefits of summary-based symbolic evaluation and show that \toolname outperforms state-of-the-art smart contracts analyzers, \teether, \mythril, and \contractfuzz, in terms of running time, precision, and soundness.
Workshop
Workshop
Workshop
Nowadays, robots are widely used in many areas of our lifes, such as autonomous storage, self-driving vehicles, drones, industrial automation, etc. Energy is a critical factor for robotic systems, especially for mobile robots where energy is a finite resource (e.g., surveillance autonomous rovers). Since software is becoming the central focus of modern robotic systems, it is important to understand how it influences the energy consumption of the entire system. However, there is no systematic study of the state of the art in energy efficiency of robotics software that could guide research or practitioners in finding solutions and tools to develop robotic systems with energy efficiency in mind.
The goal of this paper is to present a review of existing research on energy efficiency in robotics software. Specifically, we investigate on (i) the used metrics for energy efficiency, (ii) the application domains within the robotics area covered by research on energy efficiency, (iii) the identified major energy consumers within a robotic system, (iv) how existing approaches are evaluated, (v) the used energy models, (vi) the techniques supporting the development of energy-efficient robotics software, and (vii) which quality attributes tend to be traded off when dealing with energy efficiency in robotics. We also provide a replication package to assess, extend, and/or replicate the study.
The results of this work can guide researchers and practitioners in robotics and software engineering in better reasoning and contributing to energy efficient robotics software.
Workshop
Conversion of ordinary houses into smart homes has been a rising trend for past years. Smart house development is based on the enhancement of the quality of the daily activities of normal people. But many smart homes have not been designed in a way that is user friendly for differently abled people such as immobile, bedridden (disabled people with at least one hand movable). Due to negligence and forgetfulness, there are cases where the electrical devices are left switched on, regardless of any necessity. It is one of the most occurred examples of domestic energy wastage. To overcome those challenges, this research represents the improved smart home design: MobiGO that uses cameras to capture gestures, smart sockets to deliver gesture-driven outputs to home appliances, etc. The camera captures the gestures done by the user and the system processes those images through advanced gesture recognition and image processing technologies. The commands relevant to the gesture are sent to the specific appliance through a specific IoT device attached to them. The basic literature survey content, which contains technical words, is analyzed using Deep Learning, Convolutional Neural Network (CNN), Image Processing, Gesture recognition, smart homes, IoT. Finally, the authors conclude that the MobiGO solution proposes a smart home system that is safer and easier for people with disabilities. Keywords—Deep Learning; Computer Vision; Gesture; Smart Appliances; Internet of Things
Workshop
This paper presents a preliminary study on the energy consumption of two popular web browsers. In order to properly measure the energy consumption of both environments, we simulate the usage of various applications, which the goal to mimic typical user interactions and usage.
Our preliminary results show interesting findings based on observation, such as what type of interactions generate high peaks of energy consumption, and which browser is overall the most efficient. We aim that this preliminary study may show users how very different the efficiency of browsers are, and may serve as a stepping stone to further broaden this field of study.
Workshop
In recent times, there has been a considerable increase in Cloud- Based applications and infrastructure. This has led to quicker innovations, agile businesses, availability of new services over the internet, improved collaboration, and better security. With the growth of new technologies like blockchain, quantum computing, mobility-focused applications, and edge computing, there has been an increased interest in adopting cloud services. In this paper, we highlight the different sustainability metrics and benefits while migrating workloads from the on-prem data center to the public clouds. Also, the clouds are elastic, scalable, cost-efficient, robust, and overall a better alternative to host the client applications and services. We present how the major Cloud Service Providers (CSPs) are continuously working on improving their infrastructure for a more energy efficient cloud. But with so many factors like the cost of cloud services, the location of the data center to name a few, it becomes quite a tedious task for the clients to select a cloud service provider when moving from their on-premise data center(s). Hence, we also briefly propose our solution that we are currently working on. The final goal is to have a cross-platform advisory that based on a wide-range of client-based inputs and a rich repository of current energy efficient clouds and their sustainability metrics, aims to provide them a detailed recommendation about their preferred cloud service provider. In case the client does not provide any such preference, the advisory should also recommend an ideal cloud service provider for their particular workload. This suggested action will be able to fulfill the client’s constraints as well as provide them an energy efficient cloud along with a sustainability score. This score is indicative of how much improvement in the energy consumed and carbon footprint can be achieved through this migration to the suggested cloud.
Workshop
This paper extends previous work on the concept of a new software energy metric: Energy Debt. This metric is a reflection on the implied cost, in terms of energy consumption over time, of choosing an energy flawed software implementation over a more robust and efficient, yet time consuming, approach.
This paper presents the implementation a SonarQube tool called E-Debitum which calculates the energy debt of Android applications throughout their versions. This plugin uses a robust, well defined, and extendable smell catalogue based on current green software literature, with each smell defining the potential energy savings. To conclude, an experimental validation of E-Debitum was executed on 3 popular Android applications with various releases, showing how their energy debt fluctuated throughout releases.
Workshop