Research Papers
Tue 16 Nov 2021 11:00 - 11:20 at Koala - Empirical Studies Chair(s): Felipe Fronchetti Virginia Commonwealth UniversityThe discipline of Mining Software Repositories (MSR) transforms the passive archives of data that accrue during software development into active, value-generating solutions, such as recommendation systems. It is customary to evaluate these solutions using held out historical data. While history-based evaluation makes pragmatic use of available data, historical records may be: (1) overly optimistic, since past recommendations may have been suboptimal choices for the task at hand; or (2) overly pessimistic, since ``incorrect'' recommendations may have been equal (or better) choices.
In this paper, we empirically evaluate the extent to which historical data is an appropriate benchmark for MSR solutions. As a concrete instance for experimentation, we use reviewer recommendation, which suggests community members to review change requests. We replicate the cHRev and WLRRec approaches and apply them to 9,679 reviews from the Gerrit open source community. We then assess the recommendations with members of the Gerrit reviewing community using quantitative (personalized questionnaires about their comfort level with tasks) and qualitative methods (semi-structured interviews).
We find that history-based evaluation is far more pessimistic than optimistic in the Gerrit context. Indeed, while 86% of those who had been assigned to a review in the past felt that they were well suited to handle the review, 74% of those labelled as incorrect recommendations also felt that they would have been comfortable reviewing the changes. This indicates that, on the one hand, when solutions recommend the past assignee, they should indeed be considered correct. Yet, on the other hand, recommendations labelled as incorrect because they do not match the past assignee may have been correct as well.
Our results suggest that current (reviewer) recommendation evaluations do not always model the reality of software development. Future studies may benefit from looking beyond repository data to gain a clearer understanding of the practical value of historical data in repository mining solutions.
Research Papers
Tue 16 Nov 2021 11:20 - 11:40 at Koala - Empirical Studies Chair(s): Felipe Fronchetti Virginia Commonwealth UniversityWebAssembly is the newest programming language for the Web. It defines a portable bytecode format that servers as a compilation target for languages such as C, C++, and Rust. Thus, WebAssembly binaries are usually generated using WebAssembly compilers, rather than being written manually. In order to port native code to the Web, WebAssembly compilers need to address the differences between the source and target languages and dissimilarities between their execution environments. WebAssembly compilers are also subject to bugs which can severely hurt software reliability. A deep understanding of the bugs in WebAssembly compilers can help guide compiler developers on where to concentrate development and testing efforts on.
In this paper, we conduct two empirical studies aimed at understanding the characteristics of the bugs encountered in WebAssembly compilers. First, we perform a qualitative study of bugs in the most widely-used WebAssembly compiler, Emscripten. We investigate 146 bug reports in Emscripten that are related to the unique challenges that WebAssembly compilers face when compared with standard compilers. Second, we conduct a quantitative study of 1,316 bugs in four open-source WebAssembly compilers, AssemblyScript, Binaryen, Emscripten, and Wasm-Bindgen. We analyze these bugs along three dimensions: lifecycle, impact, and sizes of bug-inducing inputs and bug fixes. These studies deepen our understanding of WebAssembly compiler bugs. We believe our analysis results will shed light on opportunities to design effective tools for testing and debugging WebAssembly compilers.
Industry Showcase
Tue 16 Nov 2021 11:40 - 11:50 at Koala - Empirical Studies Chair(s): Felipe Fronchetti Virginia Commonwealth UniversityThis paper presents industrial experiences on improving the fuzzing framework of SAP HANA to provide better configurability for continuous unit-level fuzzing. The fuzzing framework for unit-level testing should provide high configurability for the users to find and use the best configuration for each unit fuzzing driver beause each target component may have unique characteristics. To improve fuzzer configurability for unit-level testing, we provide new mutation scheduling strategies for effective uses of customized mutation operators, and new seed corpus selection strategies to configure a fuzzing campaign to check on changed code. The empirical results show that these extensions give better chances to the user to improve the effectiveness and the efficiency of the fuzzing.
Tool Demonstrations
Tue 16 Nov 2021 11:50 - 11:55 at Koala - Empirical Studies Chair(s): Felipe Fronchetti Virginia Commonwealth UniversityNumerous efforts have been invested in improving the effectiveness of bug localization techniques, whereas little attention is paid to making these tools run more efficiently in continuously evolving software repositories. This paper first analyzes the information retrieval model behind a classic bug localization tool, BugLocator, and builds a mathematical foundation that the model can be updated incrementally when codebase or bug reports evolve. Then, we present IncBL, a tool for Incremental Bug Localization in evolving software repositories. IncBL is evaluated on the Bugzbook dataset, and the results show that IncBL can significantly reduce the running time by 77.79% on average compared with re-computing the model, while maintaining the same level of accuracy. We also implement IncBL as a Github App that can be easily integrated into open-source projects on Github, and users can also deploy and use IncBL locally. The demo video for IncBL can be viewed at https://youtu.be/G4gMuvlJSb0, and the source code can be found at https://github.com/soarsmu/IncBL}
Research Papers
Tue 16 Nov 2021 12:00 - 12:20 at Koala - Languages Chair(s): Jean-Guy Schneider Deakin UniversityWe introduce a new approach, CONCH, for de-bloating contexts for all the object-sensitive pointer analysis algorithms developed for object-oriented languages, where the calling contexts of a method are distinguished by its receiver objects. Our key insight is to approximate a recently proposed set of two necessary conditions for an object to be context-sensitive, i.e., context-dependent (whose precise verification is undecidable) with a set of three linearly verifiable conditions (in terms of the number of statements in the program) that are almost always necessary for real-world object-oriented applications, based on three key observations regarding context-dependability for their objects used. To create a practical implementation, we introduce a new IFDS-based algorithm for reasoning about object reachability in a program. By debloating contexts for two representative object-sensitive pointer analyses applied to a set of 12 representative Java programs, CONCH can speed up the two baselines together substantially (3.1x on average with a maximum of 15.9x) and analyze 7 more programs scalably, but at only an negligible loss of precision (less than 0.1%).
Research Papers
Tue 16 Nov 2021 12:20 - 12:40 at Koala - Languages Chair(s): Jean-Guy Schneider Deakin UniversityTraditionally, high performance kernels (HPKs) have been written in statically typed languages, such as C/C++ and Fortran. A recent trend among scientists—prototyping applications in dynamic languages such as Python—created a gap between the applications and existing HPKs. Thus, scientists have to either reimplement necessary kernels or manually create a connection layer to leverage existing kernels. Either option requires substantial development effort and slows down progress in science. We present a technique, dubbed WayOut, which automatically generates the entire connection layer for HPKs invoked from Python and written in C/C++. WayOut performs a hybrid analysis: it statically analyzes header files to generate Python wrapper classes and functions, and dynamically generates bindings for those kernels. By leveraging the type information available at run-time, it generates only the necessary bindings. We evaluate WayOut by rewriting dozens of existing examples from C/C++ to Python and leveraging HPKs enabled by WayOut. Our experiments show the feasibility of our technique, as well as negligible performance overhead on HPKs performance.
New Ideas and Emerging Results (NIER) track
Tue 16 Nov 2021 12:40 - 12:50 at Koala - Languages Chair(s): Jean-Guy Schneider Deakin UniversityWebAssembly is the newest web standard. It features a compact bytecode format, making it fast to be loaded and decoded. While WebAssembly is generally expected to be faster than JavaScript, there have been mixed results in proving which code is faster. Unfortunately, little research has been done in understanding WebAssembly’s performance advantage. In this paper, we investigate how browser engines optimize WebAssembly execution comparing to JavaScript. Specifically, we measure their execution time and memory usage with diverse program inputs. Our results show that (1) when the program input size is small, WebAssembly outperforms JavaScript. However, WebAssembly programs become slower for larger inputs; (2) WebAssembly uses significantly more memory than their JavaScript counterparts. We believe our findings can provide insights for WebAssembly virtual machine developers to identify optimization opportunities. We also report the challenges encountered when compiling the benchmarks to WebAssembly and discuss our solutions.
Tool Demonstrations
Tue 16 Nov 2021 12:50 - 12:55 at Koala - Languages Chair(s): Jean-Guy Schneider Deakin UniversityProgrammers often use Q&A sites (e.g. Stack Overflow) to understand a root cause of program bugs. Runtime exceptions is one of such important class of bugs that is actively discussed on Stack Overflow. However, it may be difficult for beginner programmers to come up with appropriate keywords for search. Moreover, they need to switch their attentions between IDE and browser, and it is time-consuming. To overcome these difficulties, we proposed a method, ``MAESTRO'', to find suitable Q&A posts automatically for Java runtime exception by utilizing structure information of codes described in programming Q&A website. In this paper, we describe a usage scenario of IDE-plugin, the architecture and user interface of the implementation, and results of user studies. A video is available at https://youtu.be/4X24jJrMUVw . A demo software is available at https://github.com/FujitsuLaboratories/Q-A-MAESTRO .
Tool Demonstrations
Tue 16 Nov 2021 12:55 - 13:00 at Koala - Languages Chair(s): Jean-Guy Schneider Deakin UniversityIn this paper, we demonstrate the implementation details and usage of GenTree, a dynamic analysis tool for learning a program’s interactions. Configurable software systems, while providing more flexibility to the users, are harder to develop, test, and analyze. GenTree can efficiently analyze the interactions among configuration options in configurable software. These interactions compactly represent large sets of configurations and thus allow us to efficiently analyze and discover interesting properties (e.g., bugs) in configurable software. Our experiments on 17 configurable systems spanning 4 languages show that GenTree efficiently finds precise interactions using a tiny fraction of the configuration space. GenTree and its dataset are opensource and available at https://github.com/unsat/gentree and a video demo is at https://youtu.be/x3eqUflvlN8
Research Papers
Tue 16 Nov 2021 18:00 - 18:20 at Koala - Mining and Issues Chair(s): Hongyu Zhang University of NewcastleVisualizations are widely used to communicate findings and make data-driven decisions. Unfortunately creating bespoke and reproducible visualizations requires the use of procedural tools such as matplotlib. These tools present a steep learning curve as their documentation often lacks sufficient usage examples to help beginners get started or accomplish a specific task. Forums such as StackOverflow have long helped developers search for code online and adapt it for their use. However such forums still place the burden on the developer to sift through results and understand the code before adapting it for their use.
We build a tool VizSmith which improves \emph{code reuse} for visualizations by mining visualization code from Kaggle notebooks and creating a database of 7176 \emph{reusable} Python functions. Given a dataset, columns to visualize and a text query from the user, VizSmith uses this database to search for appropriate functions, runs them and displays the generated visualizations to the user. At the core of VizSmith is a novel metamorphic testing based approach to automatically assess the reusability of functions, which improves end-to-end synthesis performance by 10% and cuts number of execution failures by 50%.
Research Papers
Tue 16 Nov 2021 18:20 - 18:40 at Koala - Mining and Issues Chair(s): Hongyu Zhang University of NewcastleCommunity live chats contain rich sets of information for potential improvement on software quality and productivity. One of the important applications is to mine knowledge on issues and their potential solutions. However, it remains a challenging problem to accurately mine such knowledge due to the noisy nature of interleaved dialogs in live chat data. In this paper, we first formulate the problem of issue-solution pair extraction from developer live chat data, and propose an automated approach, named ISPY, based on natural language processing and deep learning techniques with customized enhancements, to address the problem. Specifically, ISPY automates three tasks: 1) Disentangle live chat logs, employing a feedforward neural network to automatically disentangle a conversation history into separate dialogs; 2) Detect dialogs discussing issues, using a novel convolutional neural network (CNN), which consists of a BERT-based utterance embedding layer, a context-aware dialog embedding layer, and an output layer; 3) Extract appropriate utterances and combine them as corresponding solutions, based on the same CNN structure but with different feeding inputs. To evaluate ISPY, we compare it with six baselines, utilizing a dataset with 750 dialogs including 171 issue-solution pairs and evaluate ISPY from eight Gitter communities. The results show that, for issue-detection, our approach achieves the F1 of 76%, and outperforms all baselines by 30%. For solution-extraction, our approach achieves the F1 of 63%, and outperforms the baselines by 20%. Furthermore, we apply ISPY on three new communities to extensively evaluate ISPY’s practical usage. Moreover, we publish over 30K issue-solution pairs extracted from 11 communities. We believe that ISPY can facilitate community-based software development by promoting knowledge sharing and shortening the issue-resolving process.
Pre-printNew Ideas and Emerging Results (NIER) track
Tue 16 Nov 2021 18:40 - 18:50 at Koala - Mining and Issues Chair(s): Hongyu Zhang University of NewcastleCode comments are vital for software development and maintenance. To supplement the code comments, researchers design automatic tools for method-level comment generation. The prior tools generate comments that explain code functionalities, but cannot generate comments that explain why code was developed as it is. Issue reports contain rich information on how code was maintained. The valuable details of issue reports (e.g. maintenance types, symptoms, and purposes of modifications) are useful to understand source code, especially when programmers learn why code was developed in a specific way. To generate such comments, it is desirable if an approach can automatically build the links between code fragments and issue reports. In this paper, we propose the first approach for this research purpose. Our results show that it relinks more than 70% issue numbers that are written by programmers in code comments. Furthermore, the links built by our tool covers 4X bugs, and 10X other issues than the links written in manual comments. We present samples of our built links, and explain why our links are useful to describe the functionalities and the purpose of code.
New Ideas and Emerging Results (NIER) track
Tue 16 Nov 2021 18:50 - 19:00 at Koala - Mining and Issues Chair(s): Hongyu Zhang University of NewcastleAlthough issue reports are useful, some of them can be obsolete, in that their corresponding commits are overwritten or rolled back, with the evolution of software. The obsolete issue reports can invalidate their references and descriptions, and can have far-reaching impacts on the approaches built on them. In this paper, given an issue report, we define its obsolete ratio as its modified lines that appear in the latest source files over its total modified code lines. In this paper, we build a tool called ICLINKER. It automatically relinks an issue report to its commits, and compares its modified files with the latest files to calculate the ratio. To the best of our knowledge, we conduct the first empirical study to analyze obsolete issue reports.
Research Papers
Tue 16 Nov 2021 19:00 - 19:20 at Koala - Using Knowledge Chair(s): Dalal Alrajeh Imperial College LondonMany modern software engineering tools integrate SMT decision procedures and rely on the accuracy and performance of SMT solvers. We describe four basic patterns for integrating constraint solvers (earliest verdict, majority vote, feature-based solver selection, and verdict-based second attempt) that can be used for combining individual solvers into meta-decision procedures that balance accuracy, performance, and cost – or optimize for one of these metrics. In order to evaluate the effectiveness of meta-solving, we analyze and minimize $16$ existing benchmark suites and benchmark seven state-of-the-art SMT solvers on $17k$ unique instances. From the obtained performance data, we can estimate the performance of different meta-solving strategies. We validate our results by implementing and analyzing one strategy. As additional results, we obtain (a) the first benchmark suite of unique SMT string problems with validated expected verdicts, (b) an extensive dataset containing data on benchmark instances as well as on the performance of individual decision procedures and several meta-solving strategies on these instances, and (c) a framework for generating data that can easily be used for similar analyses on different benchmark instances or for different decision procedures.
Research Papers
Tue 16 Nov 2021 19:20 - 19:40 at Koala - Using Knowledge Chair(s): Dalal Alrajeh Imperial College LondonAssigning bugs to the right components is the prerequisite to get the bugs analyzed and fixed. Classification-based techniques have been used in practice for assisting bug component assignments, for example, the BugBug tool developed by Mozilla. However, our study on 124,477 bugs in Mozilla products reveals that erroneous bug component assignments occur frequently and widely. Most errors are repeated errors and some errors are even misled by the BugBug tool. Our study reveals that complex component designs and misleading component names and bug report keywords confuse bug component assignment not only for bug reporters but also developers and even bug triaging tools. In this work, we propose a learning to rank framework that learns to assign components to bugs from correct, erroneous and irrelevant bug-component assignments in the history. To inform the learning, we construct a bug tossing knowledge graph which incorporates not only goal-oriented component tossing relationships but also rich information about component tossing community, component descriptions, and historical closed and tossed bugs, from which three categories and seven types of features for bug, component and bug-component relation can be derived. We evaluate our approach on a dataset of 98,587 closed bugs (including 29,100 tossed bugs) of 186 components in six Mozilla products. Our results show that our approach significantly improve bug component assignments for both tossed and non-tossed bugs over the BugBug tool and the BugBug tool enhanced with component tossing relationships, with >20% Top-k accuracies and >30% NDCG@k (k=1,3,5,10).
Research Papers
Tue 16 Nov 2021 19:40 - 20:00 at Koala - Using Knowledge Chair(s): Dalal Alrajeh Imperial College LondonA symbolic execution engine regularly queries a Satisfiability Modulo Theory (SMT) solver to determine reachability of code during execution. Unfortunately, the SMT solver is often the bottleneck of symbolic execution. Inspired by abstract interpretation, we propose an abstract symbolic execution (ASE) engine which aims at querying the SMT solver less often by trying to compute reachability faster through an increasingly weaker abstraction. For this purpose, we have designed and implemented a value set decision procedure based on so-called strided value intervals (SVI) for efficiently determining precise, or under-approximating value sets for variables. Our ASE engine begins reasoning with respect to the SVI abstraction, and then only if needed uses the theory of bit-vectors implemented in SMT solvers. Our ASE engine efficiently detects when the former abstraction becomes incomplete to move on and try the next abstraction.
We have designed and implemented a prototype of our engine for a subset of 64-bit RISC-V. Our experimental evaluation shows that our prototype often improves symbolic execution time by significantly reducing the number of SMT queries while, whenever the abstractions do not work, the overhead for trying still remains low.
Pre-printSocial/Networking
Tue 16 Nov 2021 20:00 - 21:00 at Koala - Ask Me Anything - Tao Xie Chair(s): Rahul Gopinath CISPA, GermanyResearch Papers
Tue 16 Nov 2021 21:00 - 21:20 at Koala - APIs Chair(s): Timo Kehrer Humboldt University of BerlinLibraries are widely adopted in developing software projects. Library APIs are often missing during library evolution as library developers may deprecate, remove or refactor APIs. As a result, client developers have to manually find replacement APIs for missing APIs when updating library versions in their projects, which is a difficult and expensive software maintenance task. One of the key limitations of the existing automated approaches is that they usually consider the library itself as the single source to find replacement APIs, which heavily limits their accuracy.
In this paper, we first present an empirical study to understand characteristics about missing APIs and their replacements. Specifically, we quantify the prevalence of missing APIs, and summarize the knowledge sources where the replacements are found, and the code change and mapping cardinality between missing APIs and their replacements. Then, inspired by the insights from our study, we propose a heuristic-based approach, RepFinder, to automatically find replacements for missing APIs in library update. We design and combine a set of heuristics to hierarchically search three sources (deprecation message, own library, and external library) for finding replacements. Our evaluation has demonstrated that RepFinder can find replacement APIs effectively and efficiently, and significantly outperform the state-of-the-art approaches.
Research Papers
Tue 16 Nov 2021 21:20 - 21:40 at Koala - APIs Chair(s): Timo Kehrer Humboldt University of BerlinRecommender systems in software engineering provide developers with a wide range of valuable items to help them complete their tasks. Among others, API recommender systems have gained momentum in recent years as they became more successful at suggesting API calls or code snippets. While these systems have proven to be effective in terms of prediction accuracy, there has been less attention for what concerns such recommenders’ resilience against adversarial attempts. In fact, by crafting the recommenders’ learning material, e.g., data from large open-source software (OSS) repositories, hostile users may succeed in injecting malicious data, putting at risk the software clients adopting API recommender systems. In this paper, we present an empirical investigation of adversarial machine learning techniques and their possible influence on recommender systems. The evaluation performed on three state-of-the-art API recommender systems reveals a worrying outcome: all of them are not immune to malicious data. The obtained result triggers the need for effective countermeasures to protect recommender systems against hostile attacks disguised in training data.
Pre-printResearch Papers
Tue 16 Nov 2021 22:00 - 22:20 at Koala - Applications Chair(s): ingo Mueller Monash UniversityAutonomous Driving Systems (ADSs) are complex systems that must satisfy multiple requirements such as safety, compliance to traffic rules, and comfortableness. However, satisfying all these requirements may not always be possible due to emerging environmental conditions. Therefore, the ADSs may have to make trade-offs among multiple requirements during the ongoing operation, resulting in one or more requirements violations. For ADS engineers, it is highly important to know which combinations of requirements violations may occur, as different combinations can expose different types of failures. However, there is currently no testing approach that can generate scenarios to expose different combinations of requirements violations. To address this issue, in this paper, we introduce the notion of requirements violation pattern to characterize a specific combination of requirements violations. Based on this notion, we propose a testing approach named EMOOD that can effectively generate test scenarios to expose as many requirements violation patterns as possible. EMOOD uses a prioritization technique to sort all possible patterns to search for, from the most to the least critical ones. Then, EMOOD iteratively includes an evolutionary many-objective optimization algorithm to find different combinations of requirements violations. In each iteration, the targeted pattern is determined by a dynamic prioritization technique to give preferences to those patterns with higher criticality and higher likelihood to occur. We apply EMOOD to an industrial ADS under two common traffic situations. Evaluation results show that EMOOD outperforms three baseline approaches in generating test scenarios by discovering more requirements violation patterns.
Industry Showcase
Tue 16 Nov 2021 22:20 - 22:30 at Koala - Applications Chair(s): ingo Mueller Monash UniversityProcess modeling can benefit from automation using knowledge mined from collections of existing processes. One promising technique for such automation is the recommendation of the next elements to be added to the processes under construction. In this paper, we review an autocompletion engine that is based on the semantic similarity of business processes. To assess its efficiency in practical settings, we conduct a user study where domain experts are asked to rate the suggestions made by the engine for a commercial product. Their ratings are then compared to the engine’s accuracy measured by metrics from the natural language processing field. Our study shows a strong correlation between the expert ratings and some of these metrics. We confirm the usefulness of such an autocompletion engine, and enumerate potential improvements to any process autocompletion technique.
Industry Showcase
Tue 16 Nov 2021 22:30 - 22:40 at Koala - Applications Chair(s): ingo Mueller Monash UniversityModernizing information systems is a recurring need for large enterprises. Data migration from source to target information system is a critical step for successful modernization project. Central to data migration is \emph{data transform} that transforms the source system data into target system. Though there are different commercial tools available to address data migration challenge, creation of data transformation specification is largely a manual, knowledge intensive, and expert driven process. In this paper we present a tool that assists the experts while creating the data transformation specification by suggesting candidate field matches between the source and target data models and rules for the data transformation. Our tool is adaptive in the sense that it can take the user feedback in terms of corrected matches and validation data, and then proposes new matches and transformation rules for the remaining fields. Our tool uses machine learning, knowledge representation in order to learn and infer the candidate matches and it uses program synthesis to infer the transformation rules. We have executed our tool on real industrial data. Our schema matching recall at 5 score is 0.76, which means the experts need to look into first 5 tool recommended matches to identify the correct field match for 76 out of 100 fields. The recall at 2 score of the rule generator is 0.81, which means the experts need to look into first 2 tool suggested transformation rules to identify correct rule.
New Ideas and Emerging Results (NIER) track
Tue 16 Nov 2021 22:40 - 22:50 at Koala - Applications Chair(s): ingo Mueller Monash UniversityDecision systems such as Multiple-Criteria Decision Analysis systems formulate a decision process in terms of a mathematical function that takes into consideration different aspects of a problem. Testing such systems is crucial, as they are usually employed in safety-critical systems. A \emph{good} test suite for these systems should be able to exercise all the possible types of decisions that can be taken by the system. Classic structural coverage criteria do not provide good test suites in this sense, as they can be fulfilled by simple tests that only cover one possible type of decision. Therefore, in this paper we discuss the need for tailored coverage criteria for this class of systems, and we propose a criterion based on the perturbation of the decision systems’ parameters. We demonstrate the effectiveness of the criterion, compared to classic structural coverage criteria, on a path planner system for autonomous driving. We also discuss other benefits, such as the criterion helping explain \emph{why} a decision was made during a test.
Research Papers
Wed 17 Nov 2021 08:00 - 08:20 at Koala - Verification Chair(s): Nazareno Aguirre University of Rio Cuarto and CONICET, ArgentinaDNN validation and verification approaches that are input distribution agnostic waste effort on irrelevant inputs and report false property violations. Drawing on the large body of work on model-based validation and verification of traditional systems, we introduce the first approach that leverages environmental models to focus DNN falsification and verification on the relevant input space. Our approach, DFV, automatically builds an input distribution model using unsupervised learning, prefixes that model to the DNN to force all inputs to come from the learned distribution, and reformulates the property to the input space of the distribution model. This transformed verification problem allows existing DNN falsification and verification tools to target the input distribution – avoiding consideration of infeasible inputs. Our study of DFV with 7 falsification and verification tools, two DNNs defined over different data sets, and 93 distinct distribution models, provides clear evidence that the counter-examples found by the tools are much more representative of the data distribution, and it shows how the performance of DFV varies across domains, models, and tools.
Pre-printResearch Papers
Wed 17 Nov 2021 08:20 - 08:40 at Koala - Verification Chair(s): Nazareno Aguirre University of Rio Cuarto and CONICET, ArgentinaMany program verification tools can be customized via run-time configuration options that trade off performance, precision, and soundness. However, in practice, users often run tools under their default configurations, because understanding these tradeoffs requires significant expertise. In this paper, we ask how well a single, default configuration can work in general, and we propose SATune, a novel tool for automatically configuring program verification tools for given target programs. To answer our question, we gathered a dataset that runs four well-known program verification tools against a range of C and Java benchmarks, with results labeled as correct, incorrect, or inconclusive (e.g., timeout). Examining the dataset, we find there is generally no one-size-fits-all best configuration. Moreover, a statistical analysis shows that many individual configuration options do not have simple tradeoffs: they can be better or worse depending on the program.
Motivated by these results, we developed SATune, which constructs configurations using a meta-heuristic search. The search is guided by a surrogate fitness function trained on our dataset. We compare the performance of SATune to three baselines: a single configuration with the most correct results in our dataset; the most precise configuration followed by the most correct configuration (if needed); and the most precise configuration followed by random search (also if needed). We find that SATune outperforms these approaches by completing more correct tasks with high precision. In summary, our work shows that good configurations for verification tools are not simple to find, and SATune takes an important first step towards automating the process of finding them.
Research Papers
Wed 17 Nov 2021 08:40 - 09:00 at Koala - Verification Chair(s): Nazareno Aguirre University of Rio Cuarto and CONICET, ArgentinaSignal temporal logic (STL) is widely used to specify and analyze properties of cyber-physical systems with continuous behaviors. But STL model checking is still quite limited; existing STL model checking methods are either incomplete or very inefficient. This paper presents a new SMT-based model checking algorithm for verifying STL properties of cyber-physical systems. We propose a novel technique to reduce the STL bounded model checking problem to the satisfiability of a first-order logic formula over reals, which can be solved using state-of-the-art SMT solvers. Our algorithm is based on a new theoretical result, presented in this paper, to build a small but complete discretization of continuous signals, which preserves the bounded satisfiability of STL. Our modular translation method allows an efficient STL model checking algorithm that is refutationally complete for bounded signals, and that is much more scalable than the previous refutationally complete algorithm.
Research Papers
Wed 17 Nov 2021 09:00 - 09:20 at Koala - Analysis I Chair(s): Pavneet Singh Kochhar MicrosoftMutation analysis is a powerful dynamic approach that has many applications, such as measuring the quality of test suites or automatically locating fault. However, the inherent low scalability hampers its practical use. To accelerate mutation analysis, researchers propose approaches to reduce redundant executions. A family of fork-based approaches tries to share identical executions among mutants. Fork-based approaches carry all mutants in one process, and decide whether to fork new child-processes when reach a mutated statement. The mutants carried by the parent process are split into groups and distribute to different processes to finish remaining executions. However, existing fork-based approaches have two limitations: (1) the limited analysis scope on a single statement to compare and cluster mutants prevents their systems from detecting more equivalent mutants, and (2) the interpretation of the mutants and the runtime equivalence analysis introduce significant overhead.
In this paper, we present a novel fork-based mutation analysis approach WinMut, which (1) groups mutants in a scope of mutated statements and, (2) removes redundant computations inside interpreters. WinMut not only reduces the number of invoked processes, but also has a lower cost for executing a single process. Our experiments show that our approach can further accelerate mutation analysis with an average speedup of 9.52x on the top of the state-of-the-art fork-based approach, AccMut.
Research Papers
Wed 17 Nov 2021 09:20 - 09:40 at Koala - Analysis I Chair(s): Pavneet Singh Kochhar MicrosoftStandard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsupervised Learning is a promising direction to learn hidden patterns within unlabelled data, which has only been extensively studied in defect prediction. Nevertheless, unsupervised learning can be ineffective by itself and has not been explored in other domains (e.g., static analysis and issue close time).
Motivated by this literature gap and technical limitations, we explore the performance variations seen in several simple optimization schemes. We present FRUGAL, a tuned semi-supervised method that builds on a simple optimization scheme that does not require sophisticated (e.g., deep learners) and expensive (e.g., 100% manually labelled data) methods. Our method optimizes the unsupervised learner’s configurations in the grid search manner while validating the picked settings on only 10% of the labelled train data before predicting. FRUGAL outperforms the state-of-the-art actionable static code warning recognizer and issue closed time predictor with less information, reducing the cost of labelling by 90%.
Our conclusions are two-fold. Firstly, FRUGAL can save considerable efforts in data labelling especially in validating prior work or researching new problems. Secondly, proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives. For instance, a semi-supervised learner like FRUGAL can serve as a baseline to the state-of-the-art software analytics tools.
Industry Showcase
Wed 17 Nov 2021 09:40 - 09:50 at Koala - Analysis I Chair(s): Pavneet Singh Kochhar MicrosoftContinuous Integration is a critical problem for software maintenance in global projects compromising companies’ performance, which tends to accumulate a high-resolution time due to the approval process, conflict resolution, tests, and validations. The process of the validation involves the commit description interpretation and can be automated by NLP-mechanisms. This paper presents an intelligent NLP-based approach to evaluate if the commits can integrate a software release based only on their descriptions, including paraphrasing. The solution is validated using pre-classified commits on the training stage and achieved an accuracy of 92.9%. The proposed solution allows the automatic integration of the commits in a release with confidence, mitigating the software managing efforts.
Journal-first Papers
Wed 17 Nov 2021 09:50 - 10:00 at Koala - Analysis I Chair(s): Pavneet Singh Kochhar MicrosoftMillions of smart contracts have been deployed onto Ethereum for providing various services, whose functions can be invoked. For this purpose, the caller needs to know the \textit{function signature} of a callee, which includes its function id and parameter types. Such signatures are \textit{critical} to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profiling. Unfortunately, it is challenging to recover the function signatures from contract bytecode, since neither debug information nor type information is present in the bytecode. To address this issue, prior approaches rely on source code, or a collection of known signatures from incomplete databases or incomplete heuristic rules, which, however, are far from adequate and cannot cope with the rapid growth of new contracts. In this paper, we propose a novel solution that leverages how functions are handled by Ethereum virtual machine (EVM) to automatically recover function signatures. In particular, we exploit how smart contracts determine the functions to be invoked to locate and extract function ids, and propose a new approach named \emph{type-aware} symbolic execution (TASE) that utilizes the semantics of EVM operations on parameters to identify the number and the types of parameters. Moreover, we develop \texttt{\footnotesize SigRec}, a new tool for recovering function signatures from contract bytecode without the need of source code and function signature databases. The extensive experimental results show that \texttt{\footnotesize SigRec} outperforms all existing tools, achieving an unprecedented 98.7% accuracy within 0.074 seconds. We further demonstrate that the recovered function signatures are useful in attack detection, fuzzing and reverse engineering of EVM bytecode. \textbf{The original paper can be downloaded from https://ieeexplore.ieee.org/document/9426396.}
Link to publication DOISocial/Networking
Wed 17 Nov 2021 10:00 - 11:00 at Koala - Ask Me Anything - Myra Cohen Chair(s): Muhammad Ali Gulzar Virginia Tech, USAMyra Cohen is a Professor and the Lanh and Oanh Nguyen Chair in Software Engineering in the Department of Computer Science at Iowa State University. Prior to that she was a Susan J. Rosowski Professor at the University of Nebraska-Lincoln where she was a member of the ESQuaReD software engineering research group. She received her Ph.D. from the University of Auckland , New Zealand.
Her research interests are in software testing of highly-configurable software, search based software engineering, applications of combinatorial designs, and synergies between systems and synthetic biology, and software engineering. She is the recipient of an NSF CAREER award, an AFOSR Young Investigator Award and has received 4 ACM Distinguished Paper awards. She serves on many software engineering conference program committees and is on the ASE Steering Committee. She was the general chair of ASE in 2015 and the program co-chair for ICST 2019 and ESEC/FSE 2020. She is an ACM distinguished scientist.
Please come and ask her anything.
Research Papers
Wed 17 Nov 2021 11:00 - 11:20 at Koala - Large Scale Systems Chair(s): ingo Mueller Monash UniversityThis paper is an experience paper.
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of incidents to maintain high system availability. The recent development of microservice architecture brings three major challenges (i.e., operation, system scale, and monitoring complexities) to root cause analysis (RCA) in industrial settings. To tackle these challenges, in this paper, we present Groot, an event-graph-based approach for RCA. Groot constructs a real-time causality graph based on events that summarize various types of metrics, logs, and activities in the system under analysis. Moreover, to incorporate domain knowledge from site reliability engineering (SRE) engineers, Groot can be customized with user-defined events and domain-specific rules. Currently, Groot is servicing for RCA among 5,000 services and is actively used by the SRE team in a global e-commerce system serving more than 185 million active buyers per year. Over 15 months, we collect a data set containing labeled root causes of 952 real production incidents for evaluation. The evaluation results show that Groot is able to achieve 95% top-3 accuracy and 78% top-1 accuracy. To share our experience in deploying and adopting RCA in industrial settings, we conduct survey to show that users of Groot find it helpful and easy to use. We also share the lessons learned from deploying and adopting Groot to solve RCA problems in production environments.
Industry Showcase
Wed 17 Nov 2021 11:20 - 11:30 at Koala - Large Scale Systems Chair(s): ingo Mueller Monash UniversityWhen optimizing software for the cloud, monolithic applications need to be partitioned into many smaller microservices. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applications, we show here that (a) such optimization can significantly improve microservice partitioning; and that (b) an open issue for future work is how to find which optimizer works best for different problems. To facilitate that future work, see https://github.com/yrahul3910/ase-tuned-mono2micro for a reproduction package for this research.
Industry Showcase
Wed 17 Nov 2021 11:30 - 11:40 at Koala - Large Scale Systems Chair(s): ingo Mueller Monash UniversityTesting is one of the most important steps in software development–it ensures the quality of software. Continuous Integration (CI) is a widely used testing standard that can report software quality to the developer in a timely manner during development progress. Performance, especially scalability, is another key factor for High Performance Computing (HPC) applications. There are many existing profiling and performance tools for HPC applications, but none of these are integrated into CI tools. In this work, we propose BeeSwarm, an HPC container based parallel scaling performance system that can be easily applied to the current CI test environments. BeeSwarm is mainly designed for HPC application developers who need to monitor how their applications can scale on different compute resources. We demonstrate BeeSwarm using a multi-physics HPC application with Travis CI, GitLab CI and GitHub Actions while using ChameleonCloud and Google Compute Engine as the compute backends. Our results show that BeeSwarm can be used for scalability and performance testing of HPC applications.
Research Papers
Wed 17 Nov 2021 12:00 - 12:20 at Koala - Testing Applications Chair(s): Scott BarnettWith the rapid development of deep neural networks, machine translation has achieved significant progress. Machine translation has been integrated with people’s daily life to assist in various tasks. However, machine translators, which are essentially one kind of software, also suffer from software defects. Translation errors might cause misunderstanding or even lead to threats to personal safety, marketing blunders, and political crisis. Thus, almost all translation service providers have feedback channels to collect incorrect translations for alleviating the problem of scarce data resources and improving the product performance. Inspired by the syntax structure analysis method, we introduce the constituency invariance, which reflects the structural similarity between a simple sentence and the sentences derived from it, to assist in the testing of machine translation. We implement the constituency invariance into an automated tool CIT~that can detect translation errors by checking the constituency invariance relation between the translation results. CIT~adopts constituency parse trees to represent the syntactic structures of sentences and employ an efficient data augmentation method to derive multiple new sentences based on one sentence. To validate CIT, we experiment with three widely-used machine translators, i.e., Bing Microsoft Translator, Google Translate, and Youdao Translator. With 600 seed sentences as input, CIT detects 2212, 1910, and 1590 translation errors with around 77% precision. We have submitted detected errors to the development team.
Until we submit this paper, Google, Bing, and Youdao have fixed 15.4%, 32.0%, 14.3% of reported errors, respectively.
New Ideas and Emerging Results (NIER) track
Wed 17 Nov 2021 12:20 - 12:30 at Koala - Testing Applications Chair(s): Scott BarnettAutonomous Driving Systems (ADSs), which replace humans to drive vehicles, are complex software systems deployed in autonomous vehicles (AVs). Since the execution of ADSs highly relies on maps, it is essential to perform global map-based testing for ADSs to guarantee their correctness and AVs’ safety in different situations. Existing methods focus more on specific scenarios rather than global testing throughout the map. Testing on a global map is challenging since the complex lane connections in a map can generate enormous scenarios. In this work, we propose ATLAS, an approach to ADSs’ collision avoidance testing using map topology-based scenario classification. The key insight of ATLAS is to generate diverse testing scenarios by classifying junction lanes according to their topology-based interaction patterns. First, ATLAS divides the junction lanes in a map into different classes such that an ADS can execute similar collision avoidance maneuvers on the lanes in the same class. Second, for each class, ATLAS selects one junction lane to construct the testing scenario and generate test cases using genetic algorithm. Finally, we implement and evaluate ATLAS on Baidu Apollo with the LGSVL simulator on the San Francisco map. Results show that ATLAS exposes nine types of real issues in Apollo 6.0 and reduces the number of junction lanes for testing by 98%.
Industry Showcase
Wed 17 Nov 2021 12:30 - 12:40 at Koala - Testing Applications Chair(s): Scott BarnettDiscovering the underlying structure of HMI software efficiently and sufficiently for the purpose of testing without any prior knowledge on the software logic remains a difficult problem. The key challenge lies in the complexity of the HMI software and the high variance in the coverage of current methods. In this paper, we introduce the PathFinder, an effective and automatic HMI software exploration framework. PathFinder adopts a curiosity-based reinforcement learning framework to choose actions that lead to the discovery of more unknown states. Additionally, PathFinder progressively builds a navigation model during the exploration to further improve state coverage. We have conducted experiments on both simulations and real-world HMI software testing environment, which comprise a full tool chain of automobile dashboard instrument cluster. The exploration coverage outperforms manual and fuzzing methods which are the current industrial standards.
Research Papers
Wed 17 Nov 2021 19:00 - 19:20 at Koala - Android and Python Chair(s): Li Li Monash UniversityThe Android research community has long focused on building the permission specification for Android framework APIs, which can be referenced by app developers to request the necessary permissions for their apps. However, existing studies just analyze the permission specification for Java framework APIs in Android SDK, whereas the permission specification for native framework APIs in Android NDK remains intact. Since more and more apps implement their functionalities using native framework APIs, and the permission specification for these APIs is poorly documented, the permission specification analysis for Android NDK is in urgent need. To fill in the gap, in this paper, we conduct the first permission specification analysis for Android NDK. In particular, to automatically generate the permission specification for Android NDK, we design and develop PSGen, a new tool that statically analyzes the implementation of Android framework and Android kernel to correlate native framework APIs with their required permissions. Applying PSGen to 3 Android systems, including Android 9.0, 10.0, and 11.0, we find that PSGen can precisely build the permission specification. With the help of PSGen, we discover more than 200 native framework APIs that are correlated with at least one permission.
Research Papers
Wed 17 Nov 2021 19:20 - 19:40 at Koala - Android and Python Chair(s): Li Li Monash UniversityXML configuration files are widely used in Android to define an app’s user interface and essential runtime information such as system permissions. As Android evolves, it might introduce functional changes in the configuration environment, thus causing compatibility issues that manifest as inconsistent app behaviors at different API levels. Such issues can often induce software crashes and inconsistent look-and-feel when running at certain Android versions. Existing works incur plenty of false positive and false negative issue-detection rules by conducting trivial data-flow analysis while failing to model the XML tree hierarchies of the Android configuration files. Besides, little is known on how the changes of Android framework can induce such compatibility issues. To bridge such gaps, we conducted a systematic study by analyzing 196 real-world issues collected from 43 popular apps. We identified two common patterns of Android framework code changes that induce such configuration compatibility issues. Based on the findings, we propose ConfDroid that can automatically extract rules for detecting configuration compatibility issues. The key intuition is to perform symbolic execution based on a model learned from the common code change patterns. Experiment results show that ConfDroid can successfully extract 274 valid issue-detection rules with a precision of 92.8%. Among them, 66 (24.1%) extracted rules can manifest issues that cannot be detected by the rules of state-of-the-art baselines. Eleven out of them lead to the detection of 127 reproducible configuration compatibility issues that cannot be detected by the baselines in 35 out of 316 real-world Android apps.
Research Papers
Wed 17 Nov 2021 19:40 - 20:00 at Koala - Android and Python Chair(s): Li Li Monash UniversityDynamic programming languages have been embracing gradual typing, which supports optional type annotations in source code. Type-annotating a complex and long-lasting codebase is indeed a gradual and expensive process, where two issues have troubled developers. First, there is few guidance about how to implement type annotations due to the existence of non-trivial type practices; second, there is few guidance about which portion of a codebase should be type-annotated first. To address these issues, this paper investigates the patterns of non-trivial type-annotation practices and features of type-annotated code files. Our study detected six patterns of type-annotation practices, which involve recovering and expressing design concerns. Moreover, we revealed three complementary features of type-annotated files. Besides, we implemented a tool for studying optional typing practice. We suggest that: 1) design concerns should be considered to improve type annotation implementation by following at least six patterns; 2) files critical to software architecture could be type-annotated in priority. We believe these guidelines would promote a better type annotation practice for dynamic languages. Our work benefits type annotation practices and tools in the optional typing— 1) architecture concerns should be considered to improve the effectiveness and efficiency of non-trivial type practices; 2) the files critical to software architecture could be type-annotated in priority for gradual migration from un-annotated codebases to annotated ones. We believe that such guidance would promote the balance between the benefits of optional typing and the expense required by them.
Social/Networking
Wed 17 Nov 2021 20:00 - 21:00 at Koala - Ask Me Anything - Abhik Roychoudhury Chair(s): Yi Li Nanyang Technological University, Singaporeno description available
Abhik Roychoudhury is a Professor of Computer Science at National University of Singapore. His research focuses on software testing and analysis, software security and trust-worthy software construction. His research group has built scalable techniques for testing, debugging and repair of programs using systematic semantic analysis. The capability to automatically repair programs at a large scale contributes to the vision of self-healing software. He is currently directing the National Satellite of Excellence in Trustworthy Software Systems in Singapore. He is also the Lead Principal Investigator of the Singapore Cyber-security Consortium, which is a consortium of 25 companies in the cyber-security space engaging with academia for research and collaboration.
Research Papers
Wed 17 Nov 2021 21:00 - 21:20 at Koala - Fuzzing and Smells Chair(s): Xiaoyuan Xie School of Computer Science, Wuhan University, ChinaAs one of the most successful methods at vulnerability discovery, coverage-based greybox fuzzing relies on the lightweight compile-time instrumentation to achieve the fine-grained coverage feedback of the target program. Researchers improve it by optimizing the coverage metrics without questioning the correctness of the instrumentation. However, instrumentation errors, including missed instrumentation locations and redundant instrumentation locations, harm the ability of fuzzers. According to our experiments, it is a common and severe problem in various coverage-based greybox fuzzers and at different compiler optimization levels. In this paper, we design and implement InstruGuard, an open-source and pragmatic platform to find and fix instrumentation errors. It detects instrumentation errors by static analysis on target binaries, and fixes them with a general solution based on binary rewriting. To study the impact of instrumentation errors and test our solutions, we built a dataset of 15 real-world programs and selected 6 representative fuzzers as targets. We used InstruGuard to check and repair the instrumented binaries with different fuzzers and different compiler optimization options. To evaluate the effectiveness of the repair, we ran the fuzzers with original instrumented programs and the repaired ones, and compared the fuzzing results from aspects of execution paths, line coverage, and real bug findings. The results showed that InstruGuard had corrected the instrumentation errors of different fuzzers and helped to find more bugs in the dataset. Moreover, we discovered one new zero-day vulnerability missed by other fuzzers with fixed instrumentation but without any changes to the fuzzers.
Research Papers
Wed 17 Nov 2021 21:20 - 21:40 at Koala - Fuzzing and Smells Chair(s): Xiaoyuan Xie School of Computer Science, Wuhan University, ChinaRobustness is a key concern for Rust library development because Rust promises no risks of undefined behaviors if developers use safe APIs only. Fuzzing is a practical approach for examining the robustness of programs. However, existing fuzzing tools are not directly applicable to library APIs due to the absence of fuzz targets. It mainly relies on human efforts to design fuzz targets case by case which is labor-intensive. To address this problem, this paper proposes a novel automated fuzz target generation approach for fuzzing Rust libraries via API dependency graph traversal. We identify several essential requirements for library fuzzing, including validity and effectiveness of fuzz targets, high API coverage, and efficiency. To meet these requirements, we first employ breadth-first search with pruning to find API sequences under a length threshold, then we backward search longer sequences for uncovered APIs, and finally we optimize the sequence set as a set covering problem. We implement our fuzz target generator and conduct fuzzing experiments with AFL++ on several real-world popular Rust projects. Our tool finally generates 7 to 118 fuzz targets for each library with API coverage up to 0.92. We exercise each target with a threshold of 24 hours and find 30 previously-unknown bugs from seven libraries.
Pre-printResearch Papers
Wed 17 Nov 2021 21:40 - 22:00 at Koala - Fuzzing and Smells Chair(s): Xiaoyuan Xie School of Computer Science, Wuhan University, ChinaSimilar to production code, code smells also occur in test code, where they are called test smells. Test smells have a detrimental effect not only on test code but also on the production code. To date, the majority of the research on test smells has been focusing on programming languages such as Java and Scala. However, there are no available automated tools to support the identification of test smells for Python, despite its rapid growth in popularity in recent years. In this paper, we strive to extend the research to Python, build a tool for detecting test smells in this language, and conduct an empirical analysis of test smells in Python projects. We started by gathering a list of test smells from existing research and selecting test smells that can be considered language-agnostic or have similar functionality in Python’s standard Unittest framework. In total, we identified 17 diverse test smells. Additionally, we searched for Python-specific test smells by mining for frequent code change patterns that can be considered as either fixing or introducing test smells. Based on these changes, we proposed our own test smell called Suboptimal Assert. We developed a tool called PyNose in the form of a plugin to PyCharm, a popular Python IDE. Then we conducted a large-scale empirical investigation aimed at analyzing the prevalence of test smells in Python code. Our results show that 98% of the projects and 84% of the test suites in the studied dataset contain at least one test smell. Our proposed Suboptimal Assert smell was detected in as much as 70.7% of the projects.
Pre-printResearch Papers
Wed 17 Nov 2021 22:00 - 22:20 at Koala - Performance Chair(s): Ming Wen Huazhong University of Science and TechnologyAmong daily tasks of database administrators (DBAs), the analysis of query workloads for identifying schema issues and improving performances is crucial. Although DBAs can easily identify queries that repeatedly cause performance issues, it remains challenging to automatically identify subsets of queries that share some properties only (a pattern) and foster at the same time some target measures, such as execution time. Patterns are defined on combinations of query clause, environment variables, database alerts and metrics and help answer questions like what makes SQL queries slow? What makes I/O communications high? Automatically discovering these patterns in a huge search space and providing them as hypotheses for helping DBAs to localize issues and root-causes is an actual problem for explainable AI. To tackle it, we introduce an original approach rooted on Subgroup Discovery. We show how to instantiate and develop this generic data-mining framework to identify potential causes of SQL workloads issues. We believe indeed that such data-mining technique is not trivial to apply for DBAs. As such, we also provide a visualization tool for interactive knowledge discovery. We analyse a one week workload from hundreds of databases from our company, make both the dataset and source code available, and experimentally show that insightful hypotheses can be discovered.
Research Papers
Wed 17 Nov 2021 22:20 - 22:40 at Koala - Performance Chair(s): Ming Wen Huazhong University of Science and TechnologyService reliability is one of the key challenges that cloud providers have to deal with. In cloud systems, unplanned service failures may cause severe cascading impacts on their dependent services, deteriorating customer satisfaction. Predicting the cascading impacts accurately and efficiently is critical to the operation and maintenance of cloud systems. Existing approaches identify whether one service depends on another via distributed tracing but no prior work focused on discriminating the intensity of the dependency between cloud services. In this paper, we empirically study the outages and the procedure for failure diagnosis in two cloud providers to motivate the definition of the intensity of dependency. Then we propose AIM, the first approach to predict the intensity of dependencies between cloud microservices. AIM first generates a set of candidate dependency pairs from the spans. AIM then represents the status of each cloud service with a multivariate time series aggregated from the spans. With the representation of services, AIM calculates the similarities between the statuses of the caller and callee of each candidate pair. Finally, AIM aggregates the similarities to produce a unified value as the intensity of the dependency. We evaluate AIM on the data collected from an open-source microservice benchmark and a cloud system in production. The experimental results show that AIM can efficiently and accurately predict the intensity of dependencies. We further demonstrate the usefulness of our method in a large-scale cloud system. We plan to release both datasets to facilitate future studies.
Journal-first Papers
Wed 17 Nov 2021 22:40 - 22:50 at Koala - Performance Chair(s): Ming Wen Huazhong University of Science and TechnologySentiment analysis methods have become popular for investigating human communication, including discussions related to software projects.n Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent off-the-shelf SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research.
First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used off-the-shelf. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.
Tool Demonstrations
Tue 16 Nov 2021 22:00 - 22:02 at Kangaroo - Tool Demo (1) Chair(s): Sridhar Chimalakonda RISHA Lab, Indian Institute of Technology, TirupatiManaging large and fast-evolving software systems can be a challenging task. Numerous solutions have been developed to assist in this process, enhancing software quality and reducing development costs. These techniques—e.g., regression test selection and change impact analysis—are often built as standalone tools, unable to share or reuse information among them. In this paper, we introduce a software evolution management engine, EVOME, to streamline and simplify the development of such tools, allowing them to be easily prototyped using an intuitive query language and quickly deployed for different types of projects. EVOME is based on differential factbase, a uniform exchangeable representation of evolving software artifacts, and can be accessed directly through a Web interface. We demonstrate the usage and key features of EVOME on real open-source software projects. The demonstration video can be found at: http://youtu.be/6mMgu6rfnjY.
Pre-printTool Demonstrations
Tue 16 Nov 2021 22:10 - 22:12 at Kangaroo - Tool Demo (1) Chair(s): Sridhar Chimalakonda RISHA Lab, Indian Institute of Technology, TirupatiInspection of code changes is a time-consuming task that constitutes a big part of everyday work of software engineers. Existing IDEs provide little information about the semantics of code changes within the file editor view. Therefore developers have to track changes across multiple files, which is a hard task with large codebases.
In this paper, we present RefactorInsight, a plugin for IntelliJ IDEA that introduces a smart diff for code changes in Java and Kotlin where refactorings are auto-folded and provided with their description, thus allowing users to focus on changes that modify the code behavior like bug fixes and new features. RefactorInsight supports three usage scenarios: viewing smart diffs with auto-folded refactorings and hints, inspecting refactorings in pull requests and at any specific commit in the project change history, and exploring the refactoring history of methods and classes. The evaluation shows that commit processing time is acceptable: on median it is less than 0.2 seconds, which delay does not disrupt developers’ IDE workflows.
RefactorInsight is available at https://github.com/JetBrains-Research/RefactorInsight. The demonstration video is available at https://youtu.be/-6L2AKQ66nA.
Pre-printResearch Papers
Thu 18 Nov 2021 09:00 - 09:20 at Koala - Testing II Chair(s): Rui Abreu Faculty of Engineering, University of Porto, PortugalTesting concurrent systems remains an uncomfortable problem for developers. The common industrial practice is to stress-test a system against large workloads, with the hope of triggering enough corner-case interleavings that reveal bugs. However, stress testing is often inefficient and its ability to get coverage of interleavings is unclear. In reaction, the research community has proposed the idea of systematic testing, where a tool takes over the scheduling of concurrent actions so that it can explore the space of interleavings.
We present an experience paper on the application of systematic testing to several case studies. We separate the algorithmic advancements in prior work (on searching the large space of interleavings) from the engineering of their tools. The latter was unsatisfactory; often the tools were limited to a small domain, hard to maintain, and hard to extend to other domains. We designed Nekara, an open-source cross-platform library for easily building custom systematic testing solutions.
We show that (1) Nekara can effectively encapsulate state-of-the-art exploration algorithms by evaluating on prior benchmarks, and (2) Nekara can be applied to a wide variety of scenarios, including existing open-source systems as well as cloud services of a major IT company. Nekara was easy to use, improved testing, and found multiple new bugs.
Research Papers
Thu 18 Nov 2021 09:20 - 09:40 at Koala - Testing II Chair(s): Rui Abreu Faculty of Engineering, University of Porto, PortugalThe past few years have witnessed the proliferation of quantum software stacks (QSS) developed in response to rapid hardware advances in quantum computing. A QSS includes a quantum programming language, an optimizing compiler that compiles a quantum algorithm expressed in a high-level language into quantum gate instructions, a quantum simulator that emulates these instructions on a classical device, the control software that turns circuits into analog signals sent to the quantum computer, and execution on very expensive quantum hardware. In comparison to traditional compilers and architecture simulators, QSSes are difficult to tests due to the probabilistic nature of results, the lack of clear hardware specifications, and quantum programming complexity. This work devises a novel differential testing approach for QSSes, named QDiff with three major innovations: (1) We generate input programs to be tested via semantics-preserving, source to source transformation to explore program variants. (2) We speed up differential testing by filtering out quantum circuits that are not worthwhile to execute on quantum hardware by analyzing static characteristics such as circuit depth, 2-gate operations, gate error rates, and T1 relaxation time. (3) We design an extensible equivalence checking mechanism via distribution comparison functions such as Kolmogorov–Smirnov test and cross entropy.
We evaluate QDiff with three widely-used open source QSSes: Qiskit from IBM, Cirq from Google, and Pyquil from Rigetti. By running \tool on both real hardware and quantum simulators, we found several critical bugs revealing potential instabilities in these platforms. QDiff’s source transformation is effective in producing semantically equivalent yet not-identical circuits (i.e., 34% of trials), and its filtering mechanism can speed up differential testing by 66%.
Research Papers
Thu 18 Nov 2021 09:40 - 10:00 at Koala - Testing II Chair(s): Rui Abreu Faculty of Engineering, University of Porto, PortugalData scientists typically practice exploratory programming using computational notebooks, to comprehend new data and extract insights. To do this they iteratively refine their code, actively trying to reuse and re-purpose solutions created by other data scientists, in real-time. However, recent studies shave shown that a vast majority of publicly available notebooks can not be executed out of the box. One of the prominent reasons is the deprecation of data science APIs used in such notebooks, due to the rapid evolution of data science libraries. In this work, we propose RELANCER, an automatic technique that restores the executability of broken Jupyter Notebooks, in near real-time, by upgrading deprecated APIs. RELANCER employs an iterative runtime error-driven approach to identify and fix one API issue at a time. This is supported by a machine-learned model which uses the runtime error message to predict the kind of API repair needed - an update in API or package name, a parameter, or a parameter value. Then RELANCER creates a search space of candidate repairs by combining knowledge from API migration examples on GitHub as well as the API documentation and employs a second machine-learned model to rank this space of candidate mappings. An evaluation of RELANCER on a curated dataset of 255 un-executable Jupyter Notebooks from Kaggleshows that RELANCER can successfully restore the executability of 56% of the subjects, while baselines relying on just GitHub examples and just API documentation can only fix 37% and 36%of the subjects respectively. Further, pursuant to its real-time use case, RELANCER can restore execution to 48% of subjects, within a 5-minute time limit, while a baseline lacking its machine learning models can only fix 24%.
Social/Networking
Thu 18 Nov 2021 10:00 - 11:00 at Koala - Ask Me Anything - Miryung Kim Chair(s): Preetha Chatterjee Drexel University, USAno description available
Miryung Kim is a Full Professor in the Department of Computer Science at the University of California, Los Angeles and is a Director of Software Engineering and Analysis Laboratory. She is an expert on software evolution, and known for her research on code clones — code duplication detection, management, and removal solutions. She has taken a leadership role in creating and defining the emerging area of Software Engineering for Data Analytics (SE4DA) and organized a Dagstuhl Seminar on SE4ML. Please watch her video on SE4DA here.
She received her B.S. in Computer Science from Korea Advanced Institute of Science and Technology in 2001 and her M.S. and Ph.D. in Computer Science and Engineering from the University of Washington in 2003 and 2008 respectively. She received various awards including an NSF CAREER award, Google Faculty Research Award, and Okawa Foundation Research Award. She was previously an assistant professor at the University of Texas at Austin. Her research is funded by National Science Foundation, Air Force Research Laboratory, Google, IBM, Intel, Okawa Foundation, Samsung, and Office of Naval Research. She is a Keynote speaker at ASE 2019. a Program Co-Chair of ESEC/FSE 2022, a Program Co-Chair of ICSME 2019, and was an Associate Editor of IEEE Transactions on Software Engineering.
Research Papers
Thu 18 Nov 2021 11:00 - 11:20 at Koala - Apps Chair(s): Chunyang Chen Monash UniversityAn increasing number of people are dependent on mobile devices to access data and complete essentials tasks. For people with disabilities, mobile apps that violate accessibility guidelines can prevent them from carrying out these activities. Size-based inaccessibility is one of the top accessibility issues in mobile applications. These issues make apps difficult to use, especially for older people and people with motor disabilities. Existing accessibility related techniques are limited in terms of helping developers to resolve these issues. In this paper, we present our novel automated approach for repairing size accessibility issues in mobile applications. Our empirical evaluation showed that our approach was able to successfully resolve 99% of the reported size accessibility issues and received a high approval rating in a user study of the appearance of the repaired user interfaces.
Research Papers
Thu 18 Nov 2021 11:20 - 11:40 at Koala - Apps Chair(s): Chunyang Chen Monash UniversityThe skyrocketing growth of mobile apps and mobile devices has significantly fueled the competition among the developers. App developers endeavoured the store capabilities as an opportunity to analyse data in order to provide improvement recommendations for the evolution of any given app. Previous research shows that app developers mostly rely on in-domain (i.e., same domain or same app) data to improve their apps. However, relying on in-domain data causes low diversity and lacks novelty in recommended features. In this work, we present an approach to automatically classify, group and rank popular features from cross-domain apps. We follow 3 steps- (1) identify cross-domain apps that are relevant to the target app in terms of feature relevance and determine non-similar feature co-existence among relevant apps, (2) filter and group cross-domain features that are complementary relevant to target app using semantic feature relevance technique, (3) rank and prioritize popular features from cross-domain for adoption based on the distribution of domains, apps, popularity, features in the relevant feature group. We run extensive experiments on 100 target apps from 10 categories and 15200 cross-domain apps from 31 categories. We found encouraging results from the experiments, especially, the semantic feature grouping technique outperforms two other baseline techniques. The empirical evaluation validates the efficacy of our approach, thereby providing personalised feature recommendations to developers and enhance user serendipity.
Research Papers
Thu 18 Nov 2021 11:40 - 12:00 at Koala - Apps Chair(s): Chunyang Chen Monash UniversityWriting UI tests manually requires significant effort. Several recent approaches have tried to address this problem in mobile apps: by exploiting the similarities of two different apps within the same domain (e.g., shopping apps) on a single platform (primarily Android), they have shown that it is possible to transfer tests that exercise similar functionality between the apps. An offshoot of this work has recently yielded a technique that transfers UI tests uni-directionally, from an open-source iOS app to the same app implemented for Android. This paper presents MAPIT, a technique that expands the existing body of work in three important ways: (1) MAPIT enables bi-directional UI test transfer between pairs of “sibling” Android and iOS apps; (2) MAPIT does not assume that the apps’ source code is available; (3) MAPIT is capable of transferring tests containing oracles in addition to UI events. MAPIT runs existing tests on a “source” app and builds an internal, partial model of the app corresponding to each test. The model comprises the user-visible features of the app (namely, screenshot bitmaps), the obtainable properties of each screenshot’s constituent elements (e.g., widget IDs), and the labeled transitions between the screenshots. MAPIT uses this model to determine the corresponding information on the “target” app and generates an equivalent test, via a novel approach that leverages computer vision and natural language processing. Our evaluation on a diverse set of widely used, closed-source sibling Android and iOS apps shows that MAPIT is feasible, accurate, and useful in transferring UI tests across platforms.
Pre-printResearch Papers
Thu 18 Nov 2021 18:00 - 18:20 at Koala - Configuration Chair(s): Maria Kechagia University College LondonLarge services depend on correct configuration to run efficiently and seamlessly. Checking such configuration for correctness has become a very important problem because services use a large and continuously increasing number of configuration files and parameters. Yet, very few such tools exist because the definition of correctness for a configuration parameter is seldom specified or documented, existing either as tribal knowledge among a few domain experts or not at all.
In this paper, we address the problem of configuration pattern mining: learning configuration rules from example. Using program synthesis and a novel string profiling algorithm, we show that we can use file contents and histories of commits to learn patterns in configuration. We have built a tool called ConfMiner that implements configuration pattern mining and have deployed it on four large repositories containing configuration for a large-scale enterprise service. Our evaluation shows that ConfMiner learns a large variety of configuration rules with high precision and is very useful in flagging anomalous configuration.
Research Papers
Thu 18 Nov 2021 18:20 - 18:40 at Koala - Configuration Chair(s): Maria Kechagia University College LondonStatus code mappings reveal state shifts of a program, mapping one status code to another. Due to careless programming or the lack of the system-wide knowledge of a whole program, developers can make incorrect mappings. Such errors are widely spread across modern software, some of which have even become critical vulnerabilities. Unfortunately, existing solutions merely focus on single status code values, while never considering the relationships, that is, mappings, among them. Therefore, it is imperative to propose an effective method to detect status code mapping errors.
In this paper, we propose Transcode to detect potential status code mapping errors. It firstly conducts value flow analysis to efficiently and precisely collect candidate status code values, that is, the integer values, which are checked by following conditional comparisons. Then, it aggregates the correlated status codes according to whether they are propagated with the same variable. Finally, Transcode extracts mappings based on control dependencies and reports the mapping error if one status code is mapped to two others of the same kind. We have implemented Transcode as a prototype system, and evaluated it with 5 real-world software projects, each of which possesses in the order of a million lines of code. The experimental results show that Transcode is capable of handling large-scale systems in both a precise and efficient manner. Furthermore, it has discovered 59 new errors in the tested projects, among which 13 have been fixed by the community. We also deploy Transcode in WeChat, a widely-used instant messaging service, and have succeeded in finding real mapping errors in the industrial settings.
Research Papers
Thu 18 Nov 2021 18:40 - 19:00 at Koala - Configuration Chair(s): Maria Kechagia University College LondonWe present a new approach for synthesising Pareto-optimal Markov decision process (MDP) policies that satisfy complex combinations of quality-of-service (QoS) software requirements. These policies correspond to optimal designs or configurations of software systems, and are obtained by translating MDP models of these systems into parametric Markov chains, and using multi-objective genetic algorithms to synthesise Pareto-optimal parameter values that define the required MDP policies. We use case studies from the service-based systems and robotic control software domains to show that our MDP policy synthesis approach can handle a wide range of QoS requirement combinations unsupported by current probabilistic model checkers. Moreover, for requirement combinations supported by these model checkers, our approach generates better Pareto-optimal policy sets according to established quality metrics.
Research Papers
Thu 18 Nov 2021 19:00 - 19:20 at Koala - Bugs II Chair(s): Annibale Panichella Delft University of TechnologyThe smart pointer mechanism, which is improved in the continuous versions of the C++ standards over the last decade, is designed to prevent memory-leak bugs by automatically deallocating the managed memory blocks. However, not all kinds of memory errors can be immunized by adopting this mechanism. For example, dereferencing a null smart pointer will lead to a software failure. Due to the lack of specialized support for smart pointers, the off-the-shelf C++ static analyzers cannot effectively reveal these bugs.
In this paper, we propose a static approach to detecting memory-related bugs by tracking the heap memory management of smart pointers. The behaviors of smart pointers are modeled during their lifetime to trace the state transitions of managed memory blocks. And the specially designed checkers are used to check the state changes according to five collected error patterns. To evaluate the effectiveness of our approach, we implement it on the top of the Clang Static Analyzer. A set of handmade code snippets, as well as nine popular open-source C++ projects, are used to compare our tool against four other analyzers. The results show that our approach can successfully discover nearly all the built-in errors. And 442 out of 648 reports generated from the open-source projects are true positives after manual reviewing, where the bugs of dereferencing null smart pointers are most frequently reported. To further confirm our reports, we design patches for Aria2, Restbed, MySQL and LLVM, in which seven pull requests covering 76 bug reports have been merged by the developers up to now. The results indicate that pointers should always be carefully used even after migrated to smart pointers and static analysis upon specialized models can effectively detect such errors.
Research Papers
Thu 18 Nov 2021 19:20 - 19:40 at Koala - Bugs II Chair(s): Annibale Panichella Delft University of TechnologyMarkdown compilers are widely used for translating plain Markdown text into formatted text, yet they suffer from performance bugs that cause performance degradation and resource exhaustion. Currently, there is little knowledge and understanding about these performance bugs in the wild. In this work, we first conduct a comprehensive study of known performance bugs in Markdown compilers. We identify that the ways Markdown compilers handle the language’s context-sensitive features are the dominant root cause of performance bugs. To detect unknown performance bugs, we develop MdPerfFuzz, a fuzzing framework with a syntax-tree based mutation strategy to efficiently generate test cases to manifest such bugs. It equips an execution trace similarity algorithm to de-duplicate the bug reports. With MdPerfFuzz, we successfully identified 216 new performance bugs in real-world Markdown compilers and applications. Our work demonstrates that the performance bugs are a common, severe, yet previously overlooked security problem.
Pre-printNew Ideas and Emerging Results (NIER) track
Thu 18 Nov 2021 19:40 - 19:50 at Koala - Bugs II Chair(s): Annibale Panichella Delft University of TechnologyRealistic benchmarks of reproducible bugs and fixes are vital to good experimental evaluation of debugging and testing approaches. However, there is no suitable benchmark suite that can systematically evaluate the debugging and testing methods of quantum programs until now. This paper proposes Bugs4Q, a benchmark of thirty-six real, manually validated Qiskit bugs from four popular Qiskit elements (Terra, Aer, Ignis, and Aqua), supplemented with the test cases for reproducing buggy behaviors. Bugs4Q also provides interfaces for accessing the buggy and fixed versions of the Qiskit programs and executing the corresponding test cases, facilitating the reproducible empirical studies and comparisons of Qiskit program debugging and testing tools. (Bugs4Q is available at https://github.com/Z-928/Bugs4Q)
Tool Demonstrations
Thu 18 Nov 2021 19:50 - 19:55 at Koala - Bugs II Chair(s): Annibale Panichella Delft University of TechnologyGiven that quantum software testing is a new area of research, there is a lack of benchmark programs and bug repositories to assess the effectiveness of testing techniques. To this end, quantum mutation analysis focuses on systematically generating faulty versions of Quantum Programs (QPs), called mutants, using mutation operators. Such mutants can be used as benchmarks to assess the quality of test cases in a test suite. Thus, we present Muskit – a quantum mutation analysis tool for QPs coded in IBM’s Qiskit language. Muskit defines mutation operators on gates of QPs and selection criteria to reduce the number of mutants to generate. Moreover, it allows for the execution of test cases on mutants and the generation of results for test analyses. Muskit is provided as a command-line interface, GUI, and web application. We validated Muskit by using it to generate and execute mutants for four QPs
Tool Demonstrations
Thu 18 Nov 2021 19:55 - 20:00 at Koala - Bugs II Chair(s): Annibale Panichella Delft University of TechnologyThe concept of the test smell represents potential problems with the readability and maintainability of the test code. Common test smells focus on static aspects of the source code, such as code length and complexity. These are easy to detect and do not cause problems in terms of test execution. On the other hand, dynamic smells, which are based on test runtime behavior, lead to misunderstanding of the test results. For example, rotten green tests give developers the false impression that the test was passed without any problems, even though the test was poorly executed. Therefore, we should detect dynamic smells and take countermeasures as early as possible through the development. In this paper, we introduce JTDog, a Gradle plugin for dynamic smell detection. JTDog has high portability due to its integration into the build tool. We applied JTDog to 150 projects on GitHub and confirmed that the JTDog plugin has high portability. In addition, JTDog detected 958 dynamic smells in 55 projects. JTDog is available at https://github.com/kusumotolab/JTDog, and the demo video is available at https://youtu.be/t374HYMCavI.
Research Papers
Thu 18 Nov 2021 21:00 - 21:20 at Koala - Repositories Chair(s): Zeqi Lin Microsoft Research, ChinaModel transformations play a fundamental role in model-driven software development. They can be used to solve or support central tasks, such as creating models, handling model co-evolution, and model merging. In the past, various (semi-)automatic approaches have been proposed to derive model transformations from meta-models or from examples. These approaches require time-consuming handcrafting or recording of concrete examples, or they are not able to derive complex transformations. We propose a novel unsupervised approach, called OCKHAM, which is able to learn edit operations from models in model repositories. OCKHAM is based on the idea that a meaningful edit operation will be one that can compress the model differences. We evaluate our approach in two controlled experiments and one real-world case study of a large-scale industrial model-driven architecture project in the railway domain. We find that our approach is able to discover frequent edit operation that have actually been applied. Furthermore, OCKHAM is able to extract edit operations in an industrial setting that are meaningful to practitioners.
Research Papers
Thu 18 Nov 2021 21:20 - 21:40 at Koala - Repositories Chair(s): Zeqi Lin Microsoft Research, ChinaPeople usually describe the key characteristics of software vulnerabilities in natural language mixed with domain-specific names and concepts. This textual nature poses a significant challenge for automatic analysis of vulnerabilities. Automatic extraction of key vulnerability aspects is highly desirable but demand significant effort to manually label data for model training. In this paper, we propose an unsupervised approach to label and extract important vulnerability concepts in textural vulnerability descriptions (TVDs). We focus on three types of phrase-based vulnerability concepts (root cause, attack vector and impact) as they are much more difficult to label and extract than name- or number-based entities (i.e., vendor, product and version). Our approach is based on a key observation that same-type of phrases, no matter how they differ in sentence structures and phrase expressions, usually share syntactically similar paths in the sentence paring trees. Therefore, we propose two path representations (absolute paths and relative paths) and use auto-encoder to encode such syntactic similarities. To address the discrete nature of our paths, we enhance traditional Variational Auto-encoder (VAE) with Gumble-Max trick for categorical data distribution, and thus creates a Categorical VAE (CaVAE). In the latent space of absolute and relative paths, we further FIt-TSNE and clustering techniques to generate clusters of same-type of concepts. Our evaluation confirms the effectiveness of our CaVAE for encoding path representations, and the accuracy of vulnerability concepts in the resulting clusters. In a concept classification task, our unsupervisedly labeled vulnerability concepts outperform the two manually labeled datasets from previous work.
Research Papers
Thu 18 Nov 2021 21:40 - 22:00 at Koala - Repositories Chair(s): Zeqi Lin Microsoft Research, ChinaWe develop a static deadlock analysis for commercial Android Java applications, of sizes in the tens of millions of LOC, under active development at a major tech company. The analysis runs primarily at code-review time, on only the modified code and its dependents; we aim at reporting to developers in under 15 minutes.
To detect deadlocks in this setting, we first model the real language as an abstract language with balanced re-entrant locks, nondeterministic iteration and branching, and non-recursive procedure calls. We show that the existence of a deadlock in this abstract language is equivalent to a certain condition over the sets of so-called \emph{critical pairs} of each program thread; these record, for all possible executions of the thread, which locks are currently held at the point when a fresh lock is acquired. Since the critical pairs of any program thread is finite and computable, the deadlock detection problem for our language is decidable, and in NP.
We then leverage these results to develop an open-source implementation of our analysis adapted to deal with real Java code. The core of the implementation is an algorithm which computes critical pairs in a compositional, abstract interpretation style, running in quasi-exponential time. Our analyser is built in the INFER verification framework and has been in industrial deployment for over two years; it has seen over two hundred fixed deadlock reports with a report fix rate of approximately 54%.
Research Papers
Thu 18 Nov 2021 22:00 - 22:20 at Koala - Modelling Chair(s): Nimrod Busany Tel Aviv UniversityIn agile software development, proper team structures and effort estimates are crucial to ensure the on-time delivery of software projects. Delivery performance can vary due to the influence of changes in teams, resulting in team dynamics that remain largely unexplored. In this paper, we explore the effects of various aspects of teamwork on delays in software deliveries. We conducted a case study at ABC and analyzed historical log data from 765,200 user stories and 571 teams to identify team factors characterizing delayed user stories. Based on these factors, we built models to predict the likelihood and duration of delays in user stories. The evaluation results show that the use of team-related features leads to a significant improvement in the predictions of delay, achieving on average 74%-82% precision, 78%-86% recall and 76%-84% F-measure. Moreover, our results show that team-related features can help improve the prediction of delay likelihood, while delay duration can be explained exclusively using them. Finally, training on recent user stories using a sliding window setting improves the predictive performance; our predictive models perform significantly better for teams that have been stable. Overall, our results indicate that planning in agile development settings can be significantly improved by incorporating team-related information and incremental learning methods into analysis/predictive models.
Pre-printResearch Papers
Thu 18 Nov 2021 22:20 - 22:40 at Koala - Modelling Chair(s): Nimrod Busany Tel Aviv UniversityThe neural network model is having a significant impact on many real-world applications. However, the ever increasing popularity and complexity of these models amplifies their security and privacy challenges, with privacy leakage from training data being one of the most prominent issues. In this context, prior studies proposed to analyze the abstraction behavior of neural network models, e.g., \emph{RNN}, to understand their robustness. However, the existing research rarely addresses privacy breach memorization in neural language models. To fill this gap, we propose a novel approach, \emph{DeepMemory}, that analyzes memorization behavior for a neural language model. We first construct a memorization-analysis oriented model, taking both training data and a neural language model as input. We then build a semantic first-order Markov model to bind the constructed semantic memorization-analysis oriented model to the training data to analyze memorization distribution. Finally, we apply our approach to address data leakage issues associated with memorization and to assist with dememorization. We evaluate our approach on one of the most popular neural language models, the \emph{LSTM}-based language model, with three public datasets, namely, WikiText-103, WMT2017, and IWSLT2016. We find that sentences in the studied datasets with low perplexity are more likely to be memorized. Our approach achieves an average AUC of 0.73 in automatically identifying data leakage issues during assessment. Finally, with the assistance from our approach, the memorization risk from the neural language model can be mitigated by mutating training data without impacting the quality of neural language models.
Research Papers
Thu 18 Nov 2021 22:40 - 23:00 at Koala - Modelling Chair(s): Nimrod Busany Tel Aviv UniversityThe Go programming language offers a wide range of primitives to coordinate lightweight threads, i.e., channels, waitgroups, and mutexes — all of which may cause concurrency bugs. Static checkers that guarantee the absence of bugs are essential to help programmers avoid these costly errors before their code is executed. However existing tools either miss too many bugs or cannot handle large programs. To address these limitations, we propose a static checker for Go programs which relies on performing bounded model checking of their concurrent behaviours. In contrast to previous works, our approach deals with large code bases, supports programs that have statically unknown parameters, and is extensible to additional concurrency primitives. Our work includes a detailed presentation of the extraction algorithm from Go programs to models, an algorithm to automatically check programs with statically unknown parameters, and a large scale evaluation of our approach. The latter shows that our approach outperforms the state-of-the-art.
Pre-print