Research Papers
Tue 12 Sep 2023 10:30 - 10:42 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & MaryResearch Papers
Tue 12 Sep 2023 10:42 - 10:54 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & MaryThe reuse and distribution of open-source software must be in compliance with its accompanying open-source license. In modern packaging ecosystems, maintaining such compliance is challenging because a package may have a complex multi-layered dependency graph with many packages, any of which may have an incompatible license. Although prior research finds that license incompatibilities are prevalent, empirical evidence is still scarce in some modern packaging ecosystems (e.g., PyPI). It also remains unclear how developers remediate the license incompatibilities in the dependency graphs of their packages (including direct and transitive dependencies), let alone any automated approaches. To bridge this gap, we conduct a large-scale empirical study of license incompatibilities and their remediation practices in the PyPI ecosystem. We find that 7.27% of the PyPI package releases have license incompatibilities and 61.3% of them are caused by transitive dependencies, causing challenges in their remediation; for remediation, developers can apply one of the five strategies: migration, removal, pinning versions, changing their own licenses, and negotiation. Inspired by our findings, we propose SILENCE, an SMT-solver-based approach to recommend license incompatibility remediations with minimal costs in package dependency graph. Our evaluation shows that the remediations proposed by SILENCE can match 19 historical real-world cases (except for migrations not covered by an existing knowledge base) and have been accepted by five popular PyPI packages whose developers were previously unaware of their license incompatibilities.
Pre-printResearch Papers
Tue 12 Sep 2023 10:54 - 11:06 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & Maryno description available
Tool Demonstrations
Tue 12 Sep 2023 11:06 - 11:18 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & MaryBus factor (BF) is a metric that tracks knowledge distribution in a project. It is the minimal number of engineers that have to leave for a project to stall. Despite the fact that there are several algorithms for calculating the bus factor, only a few tools allow easy calculation of bus factor and convenient analysis of results for projects hosted on Git-based providers.
We introduce Bus Factor Explorer, a web application that provides an interface and an API to compute, export, and explore the Bus Factor metric via treemap visualization, simulation mode, and chart editor. It supports repositories hosted on GitHub and enables functionality to search repositories in the interface and process many repositories at the same time. Our tool allows users to identify the files and subsystems at risk of stalling in the event of developer turnover by analyzing the VCS history.
The application and its source code are publicly available on GitHub at https://github.com/JetBrains-Research/bus-factor-explorer. The demonstration video can be found on YouTube: https://youtu.be/uIoV79N14z8
Research Papers
Tue 12 Sep 2023 11:30 - 11:42 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & MaryIssue-commit links, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to create tags when making commits. Existing studies have deployed deep learning techniques, including pre-trained models, to improve automatic issue-commit link recovery. Despite their promising performance, we argue that previous approaches have four main problems, hindering them from recovering links in large software projects. To overcome these problems, we propose an efficient and accurate pre-trained framework called EALink for issue-commit link recovery. EALink requires much fewer model parameters than existing pre-trained methods, bringing efficient training and recovery. Moreover, we design various techniques to improve the recovery accuracy of EALink. We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EALink. Results show that EALink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its training and inference overhead is orders of magnitude lower than existing methods. We provide our implementation and data at https://github.com/KDEGroup/EALink.
Pre-printResearch Papers
Tue 12 Sep 2023 11:42 - 11:54 at Room E - Open Source and Software Ecosystems 1 Chair(s): Denys Poshyvanyk William & Maryno description available
Research Papers
Tue 12 Sep 2023 13:30 - 13:42 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British Columbiano description available
Research Papers
Tue 12 Sep 2023 13:42 - 13:54 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British ColumbiaThe IFDS-based taint analysis employs two mutually iterative passes: a forward pass that identifies taints and a backward pass that detects aliases. This approach ensures both flow and context sensitivity, leading to remarkable precision. To preserve flow sensitivity, the IFDS-based taint analysis enhances data abstractions with activation statements that pinpoint the moment they acquire taint. Nonetheless, this mechanism can inadvertently introduce equivalent, yet redundant, value flows. This occurs when distinct activation statements are linked with the same data abstraction, resulting in unnecessary computational and memory-intensive demands on the analysis process.
We introduce MergeDroid, a novel approach to improve the efficiency of IFDS-based taint analysis by consolidating equivalent value flows. This involves merging activation statements linked to the same data abstraction from various reachable data facts that are reachable at a given program point during the backward pass. This process generates a representative symbolic activation statement applicable to all equivalent data facts, reducing them to a single symbolic data fact. During the forward pass, when this symbolic data fact returns to its point of creation, the analysis reverts to the original data facts alongside their initial activation statements. This merge-and-replay strategy eliminates redundant value flow propagation, resulting in performance gains. Furthermore, we also improve analysis efficiency and precision by leveraging context-sensitive insights from activation statements. Our evaluation on 40 Android apps demonstrates that MergeDroid significantly enhances IFDS-based taint analysis performance. On average, MergeDroid accelerates analysis by 9.0× while effectively handling 6 more apps scalably. Additionally, it reduces false positives by significantly decreasing reported leak warnings, achieving an average reduction of 19.2%.
Pre-print File AttachedResearch Papers
Tue 12 Sep 2023 13:54 - 14:06 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British Columbiano description available
Research Papers
Tue 12 Sep 2023 14:06 - 14:18 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British Columbiano description available
Journal-first Papers
Tue 12 Sep 2023 14:18 - 14:30 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British ColumbiaSoftware vulnerabilities are weaknesses in source code that can be potentially exploited to cause loss or harm. While researchers have been devising a number of methods to deal with vulnerabilities, there is still a noticeable lack of knowledge on their software engineering life cycle, for example how vulnerabilities are introduced and removed by developers. This information can be exploited to design more effective methods for vulnerability prevention and detection, as well as to understand the granularity at which these methods should aim. To investigate the life cycle of known software vulnerabilities, we focus on how, when, and under which circumstances the contributions to the introduction of vulnerabilities in software projects are made, as well as how long, and how they are removed . We consider 3,663 vulnerabilities with public patches from the National Vulnerability Database—pertaining to 1,096 open-source software projects on GitHub —and define an eight-step process involving both automated parts (e.g., using a procedure based on the SZZ algorithm to find the vulnerability-contributing commits) and manual analyses (e.g., how vulnerabilities were fixed). The investigated vulnerabilities can be classified in 144 categories, take on average at least 4 contributing commits before being introduced, and half of them remain unfixed for at least more than one year. Most of the contributions are done by developers with high workload, often when doing maintenance activities, and removed mostly with the addition of new source code aiming at implementing further checks on inputs. We conclude by distilling practical implications on how vulnerability detectors should work to assist developers in timely identifying these issues.
Link to publication DOI Pre-printResearch Papers
Tue 12 Sep 2023 14:30 - 14:42 at Room E - Vulnerability and Security 1 Chair(s): Fatemeh Hendijani Fard University of British Columbiano description available
File AttachedNIER Track
Tue 12 Sep 2023 15:30 - 15:42 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State Universityno description available
NIER Track
Tue 12 Sep 2023 15:42 - 15:54 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State UniversitySoftware developers handle many complex tasks that include gathering and applying domain knowledge, coordinating subtasks, designing interfaces, turning ideas into elegant code, and more. They must switch contexts between these tasks, incurring more cognitive costs. Recent advances in large language models (LLMs) open up new possibilities for moving beyond the support provided by automated assistants (AAs) available today. In this paper, we explore if a human memory model can provide a framework for the systematic investigation of AAs for software development based on LLMs and other new technologies.
Pre-print File AttachedTool Demonstrations
Tue 12 Sep 2023 15:54 - 16:06 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State UniversitySmart contracts are blockchain programs that often handle valuable assets. Writing secure smart contracts is far from trivial, and any vulnerability may lead to significant financial losses. To support developers in identifying and eliminating vulnerabilities, methods and tools for the automated analysis have been proposed. However, the lack of commonly accepted benchmark suites and performance metrics makes it difficult to compare and evaluate such tools. Moreover, the tools are heterogeneous in their interfaces and reports as well as their runtime requirements, and installing several tools is time-consuming.
In this paper, we present SmartBugs 2.0, a modular execution framework. It provides a uniform interface to 19 tools aimed at smart contract analysis and accepts both Solidity source code and EVM bytecode as input. After describing its architecture, we highlight the features of the framework. We evaluate the framework via its reception by the community and illustrate its scalability by describing its role in a study involving 3.25 million analyses.
Pre-print File AttachedResearch Papers
Tue 12 Sep 2023 16:06 - 16:18 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State UniversityThe rapid progress of modern computing systems has led to a growing interest in informative run-time logs. Various log-based anomaly detection techniques have been proposed to ensure software reliability. However, their implementation in the industry has been limited due to the lack of high-quality public log resources as training datasets. While some log datasets are available for anomaly detection, they suffer from limitations in (1) comprehensiveness of log events; (2) scalability over diverse systems; and (3) flexibility of log utility. To address these limitations, we propose AutoLog, the first automated log generation methodology for anomaly detection. AutoLog uses program analysis to generate run-time log sequences without actually running the system. AutoLog starts with probing comprehensive logging statements associated with the call graphs of an application. Then, it constructs execution graphs for each method after pruning the call graphs to find log-related execution paths in a scalable manner. Finally, AutoLog propagates the anomaly label to each acquired execution path based on human knowledge. It generates flexible log sequences by walking along the log execution paths with controllable parameters. Experiments on 50 popular Java projects show that AutoLog acquires significantly more (9x-58x) log events than existing log datasets from the same system, and generates log messages much faster (15x) with a single machine than existing passive data collection approaches. We hope AutoLog can facilitate the benchmarking and adoption of automated log analysis techniques.
Pre-printResearch Papers
Tue 12 Sep 2023 16:18 - 16:30 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State Universityno description available
Research Papers
Tue 12 Sep 2023 16:30 - 16:42 at Room E - Testing Tools and Techniques Chair(s): Tim Menzies North Carolina State UniversityRecognizing software entities such as library names from free-form text is essential to enable many software engineering (SE) technologies, such as traceability link recovery, automated documentation, and API recommendation. While many approaches have been proposed to address this problem, they suffer from small entity vocabularies or noisy training data, hindering their ability to recognize software entities mentioned in sophisticated narratives. To address this challenge, we leverage the Wikipedia taxonomy to develop a comprehensive entity lexicon with 79K unique software entities in 12 fine-grained types, as well as a large labeled dataset of over 1.7M sentences. Then, we propose self-regularization, a noise-robust learning approach, to the training of our software entity recognition (SER) model by accounting for many dropouts. Results show that models trained with self-regularization outperform both their vanilla counterparts and state-of-the-art approaches on our Wikipedia benchmark and two Stack Overflow benchmarks. We release our models, data, and code for future research.
Pre-print File AttachedJournal-first Papers
Wed 13 Sep 2023 10:30 - 10:42 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, IrvineSecurity testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercis- ing the system with a very large set of inputs. Full automation is thus required to lower costs and increase the effectiveness of security testing. Unfortunately, to achieve such automation, in addition to strategies for automatically deriving test inputs, we need to address the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior (e.g., the response to be received after a specific HTTP GET re- quest). In this paper, we propose Metamorphic Security Testing for Web-interactions (MST-wi), a metamorphic testing approach that integrates test input generation strategies inspired by mutational fuzzing and alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture many security properties of Web systems.
To facilitate the specification of such MRs, we provide a domain-specific language accompanied by an Eclipse editor. MST-wi automatically collects the input data and transforms the MRs into executable Java code to automatically perform security testing. It automatically tests Web systems to detect vulnerabilities based on the relations and collected data. We provide a catalog of 76 system-agnostic MRs to automate security testing in Web systems. It covers 39% of the OWASP secu- rity testing activities not automated by state-of-the-art techniques; further, our MRs can automatically discover 102 different types of vulnerabilities, which correspond to 45% of the vulnerabilities due to violations of security design principles according to the MITRE CWE database. We also define guidelines that enable test engineers to improve the testability of the system under test with respect to our approach.
We evaluated MST-wi effectiveness and scalability with two well-known Web systems (i.e., Jenkins and Joomla). It automatically detected 85% of their vulnerabilities and showed a high specificity (99.81% of the generated inputs do not lead to a false positive); our findings include a new security vulnerability detected in Jenkins. Finally, our results demonstrate that the approach scale, thus enabling automated security testing overnight.
Replication package: https://zenodo.org/record/7702754#.ZCrt1_bMKUk
Toolset: https://github.com/MetamorphicSecurityTesting/MST
Link to publication DOI Pre-printJournal-first Papers
Wed 13 Sep 2023 10:42 - 10:54 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, Irvineno description available
File AttachedResearch Papers
Wed 13 Sep 2023 10:54 - 11:06 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, IrvineIdentifying what front-end library runs on a web page is challenging. Although many mature detectors exist on the market, they suffer from false positives and the inability to detect libraries bundled by packers such as Webpack. Most importantly, the detection features they use are collected from developers’ knowledge leading to an inefficient manual workflow and a large number of libraries that the existing detectors cannot detect. This paper introduces PTDETECTOR, which provides the first automated method for generating features and detecting libraries on web pages. We propose a novel data structure, the pTree, which we use as a detection feature. The pTree is well-suited for automation and addresses the limitations of existing detectors. We implement PTDETECTOR as a browser extension and test it on 200 top-traffic websites. Our experiments show that PTDETECTOR can identify packer-bundled libraries, and its detection results outperform existing tools.
Link to publicationResearch Papers
Wed 13 Sep 2023 11:06 - 11:18 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, IrvineWebAssembly (Wasm) is a bytecode format originally serving as a compilation target for Web applications. It has recently been used increasingly on the server side, e.g., providing a safer, faster, and more portable alternative to Linux containers. With the popularity of server-side Wasm applications, it is essential to study performance issues (i.e., abnormal latency) in Wasm runtimes, as they may cause a significant impact on server-side applications. However, there is still a lack of attention to performance issues in server-side Wasm runtimes. In this paper, we design a novel differential testing approach WarpDiff to identify performance issues in server-side Wasm runtimes. The key insight is that in normal cases, the execution time of the same test case on different Wasm runtimes should follow an oracle ratio. We identify abnormal cases where the execution time ratio significantly deviates from the oracle ratio and subsequently locate the Wasm runtimes that cause the performance issues. We apply WarpDiff to test five popular server-side Wasm runtimes using 123 test cases from the LLVM test suite and demonstrate the top 10 abnormal cases we identified. We further conduct an in-depth analysis of these abnormal cases and summarize seven performance issues, all of which have been confirmed by the developers. We hope our work can inspire future investigation on improving Wasm runtime implementation and thus promoting the development of server-side Wasm applications.
Pre-print File AttachedIndustry Showcase (Papers)
Wed 13 Sep 2023 11:18 - 11:30 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, IrvineDesign review is an important initial phase of the software development life-cycle where stakeholders gain and discuss early insights into the design’s viability, discover potentially costly mistakes, and identify inconsistencies and inadequacies. For improved development velocity, it is important that design owners get their designs approved as quickly as possible.
In this paper, we discuss how engineering design reviews are typically conducted at Google, and propose a novel, structured, automated solution to improve design review velocity. Based on data collected on 141,652 approved documents authored by 41,030 users over four years, we show that our proposed solution decreases median time-to-approval by 25%, and provides further gains when used consistently. We also provide qualitative data to demonstrate our solution’s success, discuss factors that impact design review latency, propose strategies to tackle them, and share lessons learned from the usage of our solution.
Link to publicationNIER Track
Wed 13 Sep 2023 11:30 - 11:42 at Room E - Web Development 2 Chair(s): Hadar Ziv University of California, Irvineno description available
Research Papers
Wed 13 Sep 2023 13:30 - 13:42 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains ResearchAdversarial examples are important to test and enhance the robustness of deep code models. As source code is discrete and has to strictly stick to complex grammar and semantics constraints, the adversarial example generation techniques in other domains are hardly applicable. Moreover, the adversarial example generation techniques specific to deep code models still suffer from unsatisfactory effectiveness due to the enormous ingredient search space. In this work, we propose a novel adversarial example generation technique (i.e., CODA) for testing deep code models. Its key idea is to use code differences between the target input (i.e., a given code snippet as the model input) and reference inputs (i.e., the inputs that have small code differences but different prediction results with the target input) to guide the generation of adversarial examples. It considers both structure differences and identifier differences to preserve the original semantics. Hence, the ingredient search space can be largely reduced as the one constituted by the two kinds of code differences, and thus the testing process can be improved by designing and guiding corresponding equivalent structure transformations and identifier renaming transformations. Our experiments on 15 deep code models demonstrate the effectiveness and efficiency of CODA, the naturalness of its generated examples, and its capability of enhancing model robustness after adversarial fine-tuning. For example, CODA reveals 88.05% and 72.51% more faults in models than the state-of-the-art techniques (i.e., CARROT and ALERT) on average, respectively.
Pre-print File AttachedJournal-first Papers
Wed 13 Sep 2023 13:42 - 13:54 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains ResearchThe source code of successful projects is evolving all the time, resulting in hundreds of thousands of code changes stored in source code repositories. This wealth of data can be useful, e.g., to find changes similar to a planned code change or examples of recurring code improvements. This paper presents DiffSearch, a search engine that, given a query that describes a code change, returns a set of changes that match the query. The approach is enabled by three key contributions. First, we present a query language that extends the underlying programming language with wildcards and placeholders, providing an intuitive way of formulating queries that is easy to adapt to different programming languages. Second, to ensure scalability, the approach indexes code changes in a one-time preprocessing step, mapping them into a feature space, and then performs an efficient search in the feature space for each query. Third, to guarantee precision, i.e., that any returned code change indeed matches the given query, we present a tree-based matching algorithm that checks whether a query can be expanded to a concrete code change. We present implementations for Java, JavaScript, and Python, and show that the approach responds within seconds to queries across one million code changes, has a recall of 80.7% for Java, 89.6% for Python, and 90.4% for JavaScript, enables users to find relevant code changes more effectively than a regular expression-based search and GitHub’s search feature, and is helpful for gathering a large-scale dataset of real-world bug fixes.
The implementation and a web interface of DiffSearch are publicly available: http://diffsearch.software-lab.org
Link to publication DOI Pre-print File AttachedResearch Papers
Wed 13 Sep 2023 13:54 - 14:06 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains ResearchResearch Papers
Wed 13 Sep 2023 14:06 - 14:18 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains ResearchStatic Program Analysis (SPA) has long been established as an important technique for gaining insights into software systems. Over the last years, analysis designers increasingly produced analyses that are compositional, collaborative, or incremental in nature - thus relying on some form of existing results to increase performance or even precision. However, systematic result reuse is still rare in this field even though the analyzed software is mainly composed of reusable software components.
For this work, we study 40 state-of-the-art SPA implementations and find that there is a tremendous potential for reusing analysis results. We attribute this to the fact that there is no systematic process in place for persisting and sharing analysis results and propose such a process here to fill this gap. In this paper, we present SPARRI, a prototype implementation providing an HTTP API to publish, search, and reuse SPA results. Our evaluation shows that reusing existing results with SPARRI can improve analysis performance by up to 92%. Furthermore, we see potential in applying it to other research areas like empirical software studies, benchmark creation, and artifact evaluation.
Pre-printResearch Papers
Wed 13 Sep 2023 14:18 - 14:30 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains Researchno description available
Pre-printResearch Papers
Wed 13 Sep 2023 14:30 - 14:42 at Room E - Code Change Analysis Chair(s): Vladimir Kovalenko JetBrains ResearchTool Demonstrations
Wed 13 Sep 2023 15:30 - 15:42 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences KremsSoftware systems for safety-critical systems like self-driving cars (SDCs) need to be tested rigorously. Especially electronic control units (ECUs) of SDCs should be tested with realistic input data. In this context, a communication protocol called Controller Area Network (CAN) is typically used to transfer sensor data to the SDC control units. A challenge for SDC maintainers and testers is the need to manually define the CAN inputs that realistically represent the state of the SDC in the real world. To address this challenge, we developed TEASER, which is a tool that generates realistic CAN signals for SDCs obtained from sensors from state-of-the-art car simulators. We evaluated TEASER based on its integration capability into a DevOps pipeline of aicas GmbH, a company in the automotive sector. Concretely, we integrated TEASER in a Continous Integration (CI) pipeline configured with Jenkins. The pipeline executes the test cases in simulation environments and sends the sensor data over the CAN bus to a physical CAN device, which is the test subject. Our evaluation shows the ability of TEASER to generate and execute CI test cases that expose simulation-based faults (using regression strategies); the tool produces CAN inputs that realistically represent the state of the SDC in the real world. This result is of critical importance for increasing automation and effectiveness of simulation-based CAN bus regression testing for SDC software. Tool: https://doi.org/10.5281/zenodo.7964890 GitHub: https://github.com/christianbirchler-org/sdc-scissor/ releases/tag/v2.2.0-rc.1 Documentation: https://sdc-scissor.readthedocs.io
Pre-print File AttachedJournal-first Papers
Wed 13 Sep 2023 15:42 - 15:54 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences Kremsno description available
File AttachedNIER Track
Wed 13 Sep 2023 15:54 - 16:06 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences Kremsno description available
File AttachedResearch Papers
Wed 13 Sep 2023 16:06 - 16:18 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences KremsTool Demonstrations
Wed 13 Sep 2023 16:18 - 16:30 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences KremsModern software systems heavily rely on external libraries developed by third-parties to ensure efficient development. However, frequent library upgrades can lead to compatibility issues between the libraries and their client systems. In this paper, we introduce CompSuite, a dataset that includes 123 real-world Java client-library pairs where upgrading the library causes an incompatibility issue in the corresponding client. Each incompatibility issue in CompSuite is associated with a test case authored by the developers, which can be used to reproduce the issue. The dataset also provides a command-line interface that simplifies the execution and validation of each issue. With this infrastructure, users can perform an inspection of any incompatibility issue with the push of a button, or reproduce an issue step-by-step for a more detailed investigation. We make CompSuite publicly available to promote open science. We believe that various software analysis techniques, such as compatibility checking, debugging, and regression test selection, can benefit from CompSuite.
Pre-print File AttachedResearch Papers
Wed 13 Sep 2023 16:30 - 16:42 at Room E - Autonomous Systems and Agents Chair(s): Alessio Gambi IMC University of Applied Sciences KremsResearch Papers
Thu 14 Sep 2023 10:30 - 10:42 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologyA wide range of verification methods have been proposed to verify the safety properties of deep neural networks ensuring that the networks function correctly in critical applications. However, many well-known verification tools still struggle with complicated network architectures and large network sizes. In this work, we propose a network reduction technique as a pre-processing method prior to verification. The proposed method reduces neural networks via eliminating stable ReLU neurons, and transforming them into a sequential neural network consisting of ReLU and Affine layers which can be handled by the most verification tools. We instantiate the reduction technique on the state-of-the-art complete and incomplete verification tools, including alpha-beta-crown, VeriNet and PRIMA. Our experiments on a large set of benchmarks indicate that the proposed technique can significantly reduce neural networks and speed up existing verification tools. Furthermore, the experiment results also show that network reduction can improve the availability of existing verification tools on many networks by reducing them into sequential neural networks.
Pre-print File AttachedResearch Papers
Thu 14 Sep 2023 10:42 - 10:54 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologySMT solvers are utilized to check the satisfiability of logic formulas and have been applied in various crucial domains, including software verification, test case generation, and program synthesis. However, bugs hidden in SMT solvers can lead to severe consequences, causing erroneous results in these domains. Therefore, ensuring the reliability and robustness of SMT solvers is of critical importance. Despite several testing approaches proposed for SMT solvers, generating effective test formulas to comprehensively test SMT solvers remains a challenge. To address this challenge, in this study, we propose to port large language models (LLMs) to generate SMT formulas for fuzzing solvers. Specifically, the study presents a novel retrain-finetune pipeline to unleash the potential of language models to generate effective SMT formulas and improve their generation performance through data augmentation. We implemented our approach as a practical fuzzing tool, named LAST, and then extensively tested the state-of-the-art SMT solvers, namely Z3, cvc5, and Bitwuzla. To date, LAST has successfully uncovered 65 genuine bugs for the solvers, of which 45 have been fixed by the developers.
Pre-print File AttachedTool Demonstrations
Thu 14 Sep 2023 10:54 - 11:06 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologyValidation of correctness proofs is an established procedure in software verification. While there are steady advances when it comes to verification of more and more complex software systems, it becomes increasingly hard to determine which information is actually useful for validation of the correctness proof. Usually, the central piece that verifiers struggle to come up with are good loop invariants. While a proof using inductive invariants is easy to validate, not all invariants used by verifiers necessarily are inductive. In order to alleviate this problem, we propose LIV, an approach that makes it easy to check if the invariant information provided by the verifier is sufficient to establish an inductive proof. This is done by emulating a Hoare-style proof, splitting the program into Hoare triples and converting these into verification tasks that can themselves be efficiently verified by an off-the-shelf verifier. In case the validation fails, useful information about the failure reason can be extracted from the overview of which triples could be established and which were refuted. We show that our approach works by evaluating it on a state-of-the-art benchmark set. Demonstration video: https://youtu.be/mZhoGAa08Rk Reproduction package: https://doi.org/10.5281/zenodo.8289101
Pre-print File AttachedTool Demonstrations
Thu 14 Sep 2023 11:06 - 11:18 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologyAbstraction is a key technology for proving the correctness of computer programs. There are many approaches available, but unfortunately, the various techniques are difficult to combine and the successful techniques have to be re-implemented again and again.
We address this problem by using the tool CEGAR-PT, which views abstraction as program transformation and integrates different verification components off-the-shelf. The idea is to use existing components without having to change their implementation, while still adjusting the precision of the abstraction using the successful CEGAR approach. The approach is largely general: it only restricts the abstraction to transform, given a precision that defines the level of abstraction, one program into another program. The abstraction by program transformation can over-approximate the data flow (e.g., havoc some variables, use more abstract types) or the control flow (e.g., loop abstraction, slicing). Demonstration video: https://youtu.be/ASZ6hoq8asE
Pre-print File AttachedNIER Track
Thu 14 Sep 2023 11:18 - 11:30 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologyFuzzy logic is widely applied in various applications. However, verifying the correctness of fuzzy logic models can be difficult. This extended abstract presents our ongoing work on verifying fuzzy logic models. We treat a fuzzy logic model as a program and propose a verification method based on symbolic execution for fuzzy logic models. We have developed and implemented the environment models for the common functions and the inference rules in fuzzy logic models. Our preliminary evaluation shows the potential of our verification method.
Pre-printResearch Papers
Thu 14 Sep 2023 11:30 - 11:42 at Room E - Program Verification 2 Chair(s): Martin Kellogg New Jersey Institute of TechnologyThe rapid development of deep learning has significantly transformed the ecology of the software engineering field. As new data continues to grow and evolve at an explosive rate, the challenge of iteratively updating software built on neural networks has become a critical issue. While the continuous learning paradigm enables networks to incorporate new data and update accordingly without losing previous memories, resulting in a batch of new networks as candidates for software updating, these approaches merely select from these networks by empirically testing their accuracy; they lack formal guarantees for such a batch of networks, especially in the presence of adversarial samples. Existing verification techniques, based on constraint solving, interval propagation, and linear approximation, provide formal guarantees but are designed to verify the properties of individual networks rather than a batch of networks. To address this issue, we analyze the batch verification problem corresponding to several non-traditional machine learning paradigms and further propose a framework named HOBAT (BATch verification for HOmogeneous structural neural networks) to enhance batch verification under reasonable assumptions about the representation of homogeneous structure neural networks, increasing scalability in practical applications. Our method involves abstracting the neurons at the same position in a batch of networks into a single neuron, followed by an iterative refinement process on the abstracted neuron to restore the precision until the desired properties for verification are met. Our method is orthogonal to boundary propagation verification on a single neural network. To assess our methodology, we integrate it with boundary propagation verification and observe significant improvements compared to the vanilla approach. Our experiments demonstrate the enormous potential for verifying large batches of networks in the era of big data.
File AttachedIndustry Showcase (Papers)
Thu 14 Sep 2023 13:30 - 13:42 at Room E - Debugging Chair(s): Carol Hanna University College LondonIt is a higher priority for organizations to keep their source code secured. Sometimes code leaks come from acts by inside programmers. This industrial paper proposes a MORDEn (Micro Organized Remote Development Environment) toward preventing code leaks. MORDEn enables programmers capable of coding and debugging by physically separating secret code from their client. We also introduce a showcase that demonstrates the feasibility of MORDEn from a case study project using it.
File AttachedShinobu Saito is a Distinguished Researcher in the Computer and Data Science Laboratories at the NTT Corporation (Tokyo, Japan). His research interests are software requirements engineering, design recovery, business modeling, and business process management. He received his Ph.D. in system engineering at Keio University (Yokohama, Japan) in 2007. Saito was a visiting researcher at the Institute for Software Research (ISR) at the University of California, Irvine from 2016 to 2018.
Research Papers
Thu 14 Sep 2023 13:42 - 13:54 at Room E - Debugging Chair(s): Carol Hanna University College LondonThird-party Python modules are usually implemented as binary extensions by using native code (C/C++) to provide additional features and runtime acceleration. In native code, the heap-allocated PyObjects are managed by the reference counting mechanism provided in Python/C APIs for automatic reclaiming. Hence, improper refcount manipulations can lead to memory leaks and use-after-free problems, and cannot be detected by simply pairing the occurrence of source and sink points. To detect such problems, state-of-the-art approaches have made groundbreaking contributions to identifying inappropriate final refcount values before returning from native code to Python. However, not all problems can be exposed at the end of a path. To detect those hidden in the middle of a path in native code, it is also crucial to track the lifecycle state of PyObjects through the refcount and lifecycle operations in API calls.
To achieve this goal, we propose the PyObject State Transition Model (PSTM) recording the lifecycle states and refcount values of PyObjects to describe the effects of Python/C API calls and pointer operations. We track state transitions of PyObjects with symbolic execution based on the model, and report problems when a statement triggers a transition to buggy states. The program state is also expanded to handle pointer nullity checks and smart pointers of PyObjects. We conduct experiments on 12 open-source projects and detect 259 real problems out of 280 reports, which is twice as many bugs as state-of-the-art approaches. We submit 168 real bugs to those active projects, and 106 issues are either confirmed or resolved.
Pre-printResearch Papers
Thu 14 Sep 2023 13:54 - 14:06 at Room E - Debugging Chair(s): Carol Hanna University College LondonDebugging performance anomalies in databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrades. Nev- ertheless, causality analysis is challenging in practice, particularly due to limited observability. Recently, chaos engineering (CE) has been applied to test complex software systems. CE frameworks mutate chaos variables to inject catastrophic events (e.g., network slowdowns) to stress-test these software systems. The systems under chaos stress are then tested (e.g., via differential testing) to check if they retain normal functionality, such as returning correct SQL query outputs even under stress.
To date, CE is mainly employed to aid software testing. This paper identifies the novel usage of CE in diagnosing performance anomalies in databases. Our framework, PERFCE, has two phases — offline and online. The offline phase learns statistical models of a database using both passive observations and proactive chaos experiments. The online phase diagnoses the root cause of performance anomalies from both qualitative and quantitative aspects on-the-fly. In evaluation, PERFCE outperformed previous works on synthetic datasets and is highly accurate and moderately expensive when analyzing real-world (distributed) databases like MySQL and TiDB.
Pre-printResearch Papers
Thu 14 Sep 2023 14:06 - 14:18 at Room E - Debugging Chair(s): Carol Hanna University College LondonThe MAP (Mean Average Precision) metric is one of the most popular performance metrics in the field of Information Retrieval Fault Localization (IRFL). However, there are problematic implementations of this MAP metric used in IRFL research. These implementations deviate from the text book definitions of MAP, rendering the metric sensitive to the truncation of retrieval results and inaccuracies and impurities of the used datasets. The application of such a deviating metric can lead to performance overestimation. This can pose a problem for comparability, transferability, and validity of IRFL performance results. In this paper, we discuss the definition and mathematical properties of MAP and common deviations and pitfalls in its implementation. We investigate and discuss the conditions enabling such overestimation: the truncation of retrieval results in combination with ground truths spanning multiple files and improper handling of undefined AP results. We demonstrate the overestimation effects using the Bench4BL benchmark and five well known IRFL techniques. Our results indicate that a flawed implementation of the MAP metric can lead to an overestimation of the IRFL performance, in extreme cases by up to 70 %. We argue for a strict adherence to the text book version of MAP with the extension of undefined AP values to be set to 0 for all IRFL experiments. We hope that this work will help to improve comparability and transferability in IRFL research.
File AttachedResearch Papers
Thu 14 Sep 2023 14:18 - 14:30 at Room E - Debugging Chair(s): Carol Hanna University College LondonExisting search heuristics used to find input values that result in significant floating-point (FP) errors or small ranges that cover them are accompanied by severe constraints, complicating their implementation and restricting their general applicability. This paper introduces an error analysis tool called EIFFEL to infer error-inducing input ranges instead of searching them. Given an FP expression with its domain D, EIFFEL first constructs an error data set by sampling values across a smaller domain R and assembles these data into clusters. If more than two clusters are formed, EIFFEL derives polynomial curves that best fit the bound coordinates of the error-inducing ranges in R, extrapolating them to infer all target ranges of D and reporting the maximal error. Otherwise, EIFFEL simply returns the largest error across R. Experimental results show that EIFFEL exhibits a broader applicability than ATOMU and S3FP by successfully detecting the errors of all 70 considered benchmarks while the two baselines only report errors for part of them. By taking as input the inferred ranges of EIFFEL, Herbie obtains an average accuracy improvement of 3.35 bits and up to 53.3 bits.
File AttachedResearch Papers
Thu 14 Sep 2023 14:30 - 14:42 at Room E - Debugging Chair(s): Carol Hanna University College LondonInformation retrieval-based fault localization (IRFL) techniques have been proposed as a solution to identify the files that are likely to contain faults that are root causes of failures reported by users. These techniques have been extensively studied to accurately rank source files, however, none of the existing approaches have focused on the specific case of concurrent programs. This is a critical issue since concurrency bugs are notoriously difficult to identify. To address this problem, this paper presents a novel approach called BLCoiR, which aims to reformulate bug report queries to more accurately localize source files related to concurrency bugs. The key idea of BLCoiR is based on a novel knowledge graph (KG), which represents the domain entities extracted from the concurrency bug reports and their semantic relations. The KG is then transformed into the IR query to perform fault localization. BLCoiR leverages natural language processing (NLP) and concept modeling techniques to construct the knowledge graph. Specifically, NLP techniques are used to extract relevant entities from the bug reports, such as the word entities related to concurrency constructs. These entities are then linked together based on their semantic relationships, forming the KG. We have conducted an empirical study on 692 concurrency bug reports from 44 real-world applications. The results show that BLCoiR outperforms existing IRFL techniques in terms of accuracy and efficiency in localizing concurrency bugs. BLCoiR demonstrates effectiveness of using a knowledge graph to model the domain entities and their relationships, providing a promising direction for future research in this area.
Pre-printIndustry Showcase (Papers)
Thu 14 Sep 2023 15:30 - 15:42 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundThe emergence of mobile technology has significantly advanced the banking sector in terms of how consumers interact with their banks and manage their finances. The accessibility and ease of financial services have been improved by the switch from desktop banking to mobile banking. Mobile banking has a lot of advantages, but it also has security concerns. Illegal access to personal and financial information often occurs due to lapses in mobile security. In recent years, we have worked with banks from 10 countries and systematically analyzed 28 of their apps. We found several vulnerabilities in these apps by manual code reviews and by conducting 11 types of attacks. We then proposed and applied adequate security measures to protect these apps. In this paper, we report our experience and practice of securing these Android apps.
File AttachedJournal-first Papers
Thu 14 Sep 2023 15:42 - 15:54 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundFront-running attacks have been a major concern on the blockchain. Attackers launch front-running attacks by inserting additional transactions before upcoming victim transactions to manipulate victim transaction executions and make profits. Recent studies have shown that front-running attacks are prevalent on the Ethereum blockchain and have caused millions of US dollars loss. It is the vulnerabilities in smart contracts, which are blockchain programs invoked by transactions, that enable the front-running attack opportunities. Although techniques to detect front-running vulnerabilities have been proposed, their performance on real-world vulnerable contracts is unclear. There is no large-scale benchmark based on real attacks to evaluate their capabilities. We make four contributions in this paper. First, we design an effective algorithm to mine real-world attacks in the blockchain history. The evaluation shows that our mining algorithm is more effective and comprehensive, achieving higher recall in finding real attacks than the previous study. Second, we propose an automated and scalable vulnerability localization approach to localize code snippets in smart contracts that enable front-running attacks. The evaluation also shows that our localization approaches are effective in achieving higher precision in pinpointing vulnerabilities compared to the baseline technique. Third, we build a benchmark consisting of 513 real-world attacks with vulnerable code labeled in 235 distinct smart contracts, which is useful to help understand the nature of front-running attacks, vulnerabilities in smart contracts, and evaluate vulnerability detection techniques. Last but not least, we conduct an empirical evaluation of seven state-of-the-art vulnerability detection techniques on our benchmark. The evaluation experiment reveals the inadequacy of existing techniques in detecting front-running vulnerabilities, with a low recall of <= 6.04%. Our further analysis identifies four common limitations in existing techniques: lack of support for inter-contract analysis, inefficient constraint solving for cryptographic operations, improper vulnerability patterns, and lack of token support.
Link to publication DOI Pre-print File AttachedIndustry Showcase (Papers)
Thu 14 Sep 2023 15:54 - 16:06 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundAutonomous agents equipped with Large Language Models (LLMs) are rapidly gaining prominence as a revolutionary technology within the realm of Software Engineering. These intelligent systems demonstrate the capacity to autonomously perform tasks and make independent decisions, leveraging their intrinsic reasoning and decision-making abilities. This paper addresses an examination that involves delving into the current state of autonomous agents including their capabilities, challenges, and opportunities in Software Engineering practices, with special emphasis on Auto-GPT.
Research Papers
Thu 14 Sep 2023 16:06 - 16:18 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundDecentralized Finance (DeFi) apps have rapidly proliferated with the development of blockchain and smart contracts, whose maximum total value locked (TVL) has exceeded 100 billion dollars in the past few years. These apps allow users to interact and perform complicated financial activities. However, the vulnerabilities hiding in the smart contracts of DeFi apps have resulted in numerous security incidents, with most of them leading to funds (tokens) leaking and resulting in severe financial loss. In this paper, we summarize Token Leaking vulnerability of DeFi apps, which enable someone to abnormally withdraw funds that far exceed their deposits. Due to the massive amount of funds in DeFi apps, it is crucial to protect DeFi apps from Token Leaking vulnerabilities. Unfortunately, existing tools have limitations in addressing this vulnerability.
To address this issue, we propose DeFiWarder, a tool that traces on-chain transactions and protects DeFi apps from Token Leaking vulnerabilities. Specifically, DeFiWarder first records the execution logs (traces) of smart contracts. It then accurately recovers token transfers within transactions to catch the funds flow between users and DeFi apps, as well as the relations between users based on role mining. Finally, DeFiWarder utilizes anomaly detection to reveal Token Leaking vulnerabilities and related attack behaviors. We conducted experiments to demonstrate the effectiveness and efficiency of DeFiWarder. Specifically, DeFiWarder successfully revealed 25 Token Leaking vulnerabilities from 30 Defi apps. Moreover, its efficiency supports real-time detection of token leaking within on-chain transactions. In addition, we summarize five major reasons for Token Leaking vulnerability to assist DeFi apps in protecting their funds.
Research Papers
Thu 14 Sep 2023 16:18 - 16:30 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundIndustry Showcase (Papers)
Thu 14 Sep 2023 16:30 - 16:42 at Room E - Vulnerability and Security 2 Chair(s): Ben Hermann TU DortmundRecent breakthroughs in Large Language Models (LLM), comprised of billions of parameters, have achieved the ability to unveil exceptional insight into a wide range of Natural Language Processing (NLP) tasks. The onus of the performance of these models lies in the sophistication and completeness of the input prompt. As such, minimizing the series of prompt enhancements with improvised keywords becomes critically important as it directly affects the time to market and cost of the developing solution. However, this process inevitably has a trade-off between the learning curve/proficiency of the user and completeness of the prompt, as generating absolute solutions is an incremental process. In this paper, we have designed a novel solution and implemented it in the form of a plugin for Visual Studio Code IDE, which can optimize this trade-off, by learning the underline prompt intent to enhance with keywords. This will tend to align with developers’ collection of semantics while developing a secure code, ensuring parameter and local variable names, return expressions, simple pre and post-conditions, and basic control and data flow are met.