no description available
Finding the right advisor is almost as important - and perhaps even more important - than finding the right research problem. In this talk I will discuss some of the strategies for developing a supportive and productive relationship with your research advisor. Bring your questions and your ideas!
Marsha Chechik is Professor in the Department of Computer Science at the University of Toronto. She received her Ph.D. from the University of Maryland in 1996. Prof. Chechik’s research interests are in the application of formal methods to improve the quality of software. She has authored numerous papers in formal methods, software specification and verification, computer safety and security and requirements engineering. In 2002-2003, Prof. Chechik was a visiting scientist at Lucent Technologies in Murray Hill, NY and at Imperial College, London UK, and in 2013 – at Stonybrook University. She is a member of IFIP WG 2.9 on Requirements Engineering and an Associate Editor in Chief of Journal on Software and Systems Modeling. She is has been an associate editor of IEEE Transactions on Software Engineering 2003-2007, 2010-2013. She regularly serves on program committees of international conferences in the areas of software engineering and automated verification. Marsha Chechik has been Program Committee Co-Chair of the 2018 International Conference in Software Engineering (ICSE18), 2016 International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’16), the 2016 Working Conference on Verified Software: Theories, Tools, and Experiments (VSTTE16), the 2014 International Conference on Automated Software Engineering (ASE’14), the 2008 International Conference on Concurrency Theory (CONCUR’08), the 2008 International Conference on Computer Science and Software Engineering (CASCON’08), and the 2009 International Conference on Formal Aspects of Software Engineering (FASE’09). She will be PC Co-Chair of ESEC/FSE’2021. She is a Member of ACM SIGSOFT and the IEEE Computer Society.
Doctoral Symposium
Mon 15 Nov 2021 09:05 - 09:20 at Wombat - DS Session 1Software requirements Change Impact Analysis (CIA) is a pivotal process in requirements engineering (RE) since changes to requirements are inevitable. When a requirement change is requested, its impact on all software artefacts has to be investigated to accept or reject the request. Manually performed CIA in large-scale software development is time-consuming and error-prone so, automating this analysis can improve the process of requirements change management. The main goal of this research is to apply a combination of Machine Learning (ML) and Natural Language Processing (NLP) based approaches to develop a prediction model for forecasting the requirement change impact on other requirements in the specification document. The proposed prediction model will be evaluated using appropriate datasets for accuracy and performance. The resulting tool will support project managers to perform automated change impact analysis and make informed decisions on the acceptance or rejection of requirement change requests.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 09:20 - 09:35 at Wombat - DS Session 1Experimentation plays an important role in the work of data scientists to explore unfamiliar problem domains, to answer questions from the data, and to develop diverse machine learning applications. Good experimentation requires creativity, based on prior results and informed from the literature. However, finding relevant information from relevant sources to guide experimentation is causing inefficiencies during experimentation process of data scientists. The objective of this research is to help data scientists through the presentation of context aware ranked data science experiments, considering problem domain, development task and learning task. Data science experiments for this study were extracted from publicly available interactive notebooks and were manually annotated based on a taxonomy of data science techniques and a meta model of a data science experiment. Further, the ranking algorithm was developed for data science experiments for given problem domain and development task. As a result, a tool was developed to demonstrate context aware ranked data science experiments for given problem domains such as natural language processing, computer vision and time series and for development stages such as feature engineering and model selection. This study shows that tools and techniques can be designed to be more data science context aware, in fact, much more so than for software engineering tools. This study supports these efforts by providing knowledge that can improve experimentation process of data scientists.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 09:35 - 09:50 at Wombat - DS Session 1Effective locating and fixing defects requires detailed defect reports. Unlike traditional software systems, machine learning applications are subject defects caused from changes in the input data streams (concept drift) and assumptions encoded into models. Without appropriate training, developers face difficulties understanding and interpreting faults in machine learning (ML). However, little research is done on how to prepare developers to detect and investigate machine learning system defects. Software engineers often do not have sufficient knowledge to fix the issues themselves without the help of data scientists or domain experts. To investigate this issue, we analyse issue templates and check how developers report machine learning related issues in open-source applied AI projects. The overall goal is to develop a tool for automatically repairing ML defects or generating defect reports if a fix cannot be made. Previous research has identified classes of faults specific to machine learning systems, such as performance degradation arising from concept drift where the machine learning model is no longer aligned with the real-world environment. However, the current issue templates that developers use do not seem to capture the information needed. This research seeks to systematically develop a two-way human-machine information exchange protocol to support domain experts, software engineers, and data scientists to collaboratively detect, report, and respond to these new classes of faults.
Pre-print File AttachedDoctoral Symposium
Mon 15 Nov 2021 09:50 - 10:05 at Wombat - DS Session 1Software developers embed logging statements inside the source code as an imperative duty in modern software development as log files are necessary for tracking down runtime system issues and troubleshooting system management tasks. Prior research has emphasized the importance of logging statements in the operation and debugging of software systems. However, the current logging process is mostly manual and ad hoc, and thus, proper placement and content of logging statements remain as challenges. To overcome these challenges, methods that aim to automate log placement and log content, i.e., ‘where, what, and how to log’, are of high interest. Thus, in this research, we propose to accomplish the goal of this research, that is “to predict the log statements by utilizing source code clones and natural language processing (NLP)”, with the following four research objectives: (RO1) investigate whether source code clones can be leveraged for log statement location prediction, (RO2) propose a clone-based approach for log statement prediction, (RO3) predict log statement’s description with code-clone and NLP models, and (RO4) examine approaches to automatically predict additional details of log statement, such as its verbosity level and variables. For this purpose, we perform an experimental analysis on seven open-source java projects, extract their method-level code clones, investigate their attributes, and utilize them for log location and description prediction. Our work demonstrates the effectiveness of log-aware clone detection for automated log location and description prediction and outperforms the prior work.
Pre-printDoctoral Symposium
Mon 15 Nov 2021 10:05 - 10:20 at Wombat - DS Session 1no description available
Doctoral Symposium
Mon 15 Nov 2021 10:30 - 10:45 at Wombat - DS Session 2We propose an automated pipeline for analyzing privacy leaks in Android applications. By using a combination of dynamic and static analysis, we validate the results from each other to improve accuracy. Compare to the state-of-the-art approaches, we not only capture the network traffic for analysis, but also look into the data flows inside the application. We particularly focus on the privacy leakage caused by third-party services and high-risk permissions. The proposed automated approach will combine taint analysis, permission analysis, network traffic analysis, and dynamic function tracing during run-time to identify private information leaks. We further implement an automatic validation and complementation process to reduce false positives. A small-scale experiment has been conducted on 30 Android applications and a large-scale experiment on more than 10,000 Android applications is in progress.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 10:45 - 11:00 at Wombat - DS Session 2Can a machine find and fix a Semantic Bug? A Semantic Bug is a deviation from the expected program behaviour that causes to produce incorrect outputs for certain inputs. To identify this category of bugs, the knowledge on the expected program behaviour is essential. The reason is that a program with a semantic bug does not fail (i.e., crash or hang) in the middle of the execution in most scenarios. Thus, only a human (a user or a developer) knowing the correct program behaviour can detect this kind of bug by observing the output. However, identifying bugs solely through human effort is not practical for all software. A Test Oracle is any procedure used to differentiate the correct and incorrect behaviours of a program. This dissertation mainly focuses on developing learning techniques to produce Automated Test Oracles for programs with semantic bugs. Also, discovering methods to incorporate human knowledge effectively for the learning techniques is another concern. The automated test oracles could make semantic bug detection more efficient. Also, such test oracles could guide Automated Program Repair tools to generate more accurate fixes for semantic bugs.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 11:00 - 11:15 at Wombat - DS Session 2Binary code similarity detection is to detect the similarity of code at binary (assembly) level without source code. Existing works have their limitations when dealing with mutated binary code generated by different compiling options. In this paper, we propose a novel approach to addressing this problem. By inspecting the binary code, we found that generally, within a function, some instructions aim to calculate (prepare) values for other instructions. The latter instructions are defined by us as key instructions. Currently, we define four categories of key instructions: calling subfunctions, comparing instruction, returning instruction, and memory-store instruction. Thus if we symbolically execute similar binary codes, symbolic values at these key instructions are expected to be similar. As such, we implement a prototype tool, which has three steps. First, it symbolically executes binary code; Second, it extracts symbolic values at defined key instructions into a graph; Last, it compares the symbolic graph similarity. In our implementation, we also address some problems, including path explosion and loop handling.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 11:15 - 11:30 at Wombat - DS Session 2Android apps are developed using a Software Development Kit (SDK), where the Android application programming interface (API) enables app developers to harness the functionalities of Android devices by interacting with services and hardware. However, API frequently evolves together with its associated SDK. The mismatch between the API level supported by the device where apps are installed and the API level targeted by app developers can induce compatibility issues. These issues can manifest themselves as unexpected behaviors, including runtime crashes, creating a poor user experience. Recent studies investigated API evolution to ensure the reliability of the Android apps, however, they require improvements. This work aims to establish novel methodologies that will improve the state-of-the-art compatibility issue detection and testing approaches.
File Attachedno description available
Dr. Li Li is a Senior lecturer (a.k.a., Associate Professor) and a PhD supervisor at Monash University, Australia. He received his PhD degree in computer science from the University of Luxembourg in 2016. He has published over 60 research papers at prestigious conferences such as ICSE, ESEC/FSE, ASE, ISSTA, POPL, PLDI, etc. and prestigious journals such as ACM TOSEM and IEEE TSE, TIFS, TDSC, etc.
Li’s research has received various awards, including the prestigious ARC Discovery Early Career Researcher Award (DECRA in short), an ACM Distinguished Paper Award at PLDI 2021, a Best Student Paper Award at WWW 2020, an ACM Distinguished Paper Award at ASE 2018, a FOSS Impact Paper Award at MSR 2018, and a Best Paper Award at the ERA track of IEEE SANER 2016.
Li was named as one of the Top-5 most impactful earlier career software engineering researchers in the world.
Li is an active member of the software engineering and security community serving as a reviewer for many top-tier conferences and journals such as ASE, ICSME, SANER, TSE, TOSEM, TIFS, TDSC, TOPS, EMSE, JSS, IST, etc.
no description available
Bara Buhnova is an Associate Professor and vice-dean for industrial partners at Masaryk University (MU), Faculty of Informatics in Brno. She received her Ph.D. degree in computer science in 2008 in the Czech Republic, continued as a postdoc researcher in Germany and Australia, and now she leads multiple research teams at Faculty of Informatics MU (teams on software architecture, smart cities), the Institute of Computer Science MU (team on big data analytics), and Czech CyberCrime Centre of Excellence C4e (team on critical infrastructures). She is the Steering Committee chair of the ICSA conference and has been involved in organization of numerous leading conferences (e.g. ICSE, ICSA, ESEC-FSE, QoSA, CBSE, MOBILESoft). She acts as a reviewer and (guest) editor in multiple journals (e.g. IEEE TSE, Springer EMSE, Elsevier SCP, Elsevier JSS, Springer SoSyM, Wiley SME), and is member of the IEEE TSE Review Board. Next to her academic activities, she is the chair of the Association of Industrial Partners of Faculty of Informatics MU (with 30+ companies), and is a Co-Founding and Governing Board member of Czechitas, a non-profit organisation aiming at making IT skills more accessible to youth and women of any age (reaching to 20,000+ graduates).
Doctoral Symposium
Mon 15 Nov 2021 18:45 - 19:00 at Wombat - DS Session 3Software systems evolve continuously during their lifecycle. Developers incrementally introduce new features and fix bugs during the process, leading to lots of changes and artifacts accumulated. Driven by those rich data recorded in version control systems or issue trackers, lots of work has been done to analyze the software histories. In this PhD work, we propose a universal representation to effectively store and query over knowledge extracted from the histories, with the hope of supporting software evolution research. We have created a toolset, named DIFF BASE , to extract both relations between program entities at the same version, as well as atomic changes between versions. Then users can compose queries using algebraic operators, Datalog or an SQL-like language to accomplish several different evolution management tasks. Based on the existing research outcome, possible future work includes utilizing the facts approach in a scalable solution to discovering compatibility issues involving changes of multiple components and improvement on the storage and query performance of DIFFBASE.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 19:00 - 19:15 at Wombat - DS Session 3Unmanned aerial systems (UAS) have a large number of applications in civil and military domains. UAS rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics systems, including software systems. The current industrial practice is to manually create test scenarios, manually or automatically execute these scenarios using simulators, and manual evaluation of the outcomes. A fundamental part of system-level testing of such systems is the simulation of environmental context. The test scenarios typically consist of setting certain environment conditions and testing the system under test in these settings. The state-of-the-art approaches available for this purpose also require manual test scenario development and manual test evaluation. In this research work, we propose an approach to automate the system-level testing of the UAS. The proposed approach (AITester) utilizes model-based testing and artificial intelligence (AI) techniques to automatically generate, execute, and evaluate various test scenarios. The test scenarios are generated on the fly, i.e., during test execution based on the environmental context at runtime. We develop a toolset to support automation. We perform a pilot experiment using a widely-used open-source autopilot, ArduPilot. The preliminary results show that the AITester is effective and efficient in violating environmental conditions.
File AttachedHassan Sartaj is an Assistant Professor at National University of Computer and Emerging Sciences, Islamabad, Pakistan. In 2021, he received Ph.D. in Software Engineering. He is a member of the IEEE Computer Society. He is also Chapter Treasurer of the IEEE Islamabad Chapter (C16).
Doctoral Symposium
Mon 15 Nov 2021 19:15 - 19:30 at Wombat - DS Session 3Deep learning-based techniques have been widely applied to the program analysis tasks, in fields such as type inference, fault localization, and code summarization. Hitherto deep learning-based software engineering systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label a prohibitively large amount of data. However, most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a general cross-lingual transfer learning framework PLATO for program analysis by using a series of techniques that are general to different downstream tasks. PLATO allows Bert-based models to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others. We evaluate our approaches on several downstream tasks such as type inference and code summarization to demonstrate its feasibility.
Doctoral Symposium
Mon 15 Nov 2021 19:30 - 19:45 at Wombat - DS Session 3Fuzzing is a technique that aims to detect vulnerabilities or exceptions through unexpected input and has found tremendous recent interest in both academia and industry. Although these fuzzing methods have great advantages in the field of vulnerability detection, they also have their own disadvantages in the face of different target programs. It is obviously impractical for a fuzzing test method to adapt to all the target programs. Therefore, we study how to select the appropriate fuzzing methods for different target programs. Specifically, we first analyze the program, and then extract the feature vectors of the target program to get the information of the program, such as syntax, context and so on. Next, we build a matching model to match the similarity of target program and the fuzzing algorithm to select the fuzzing algorithm with higher matching degree. Through our matching model, we get a more suitable fuzzing algorithm to improve the detection efficiency, precision, recall, F-measure, and other statistical measures.
File AttachedDoctoral Symposium
Mon 15 Nov 2021 20:15 - 20:30 at Wombat - DS Session 4When users deploy or invoke smart contracts on Ethereum, a fee is charged for avoiding resource abuse. Metered in gas, the fee is the product of the amount of gas used and the gas price. The more gas used indicates a higher transaction fee. In my doctoral research, we aim to investigate two widely studied issues regarding gas, i.e., gas estimation and gas optimization. The former is to predict gas costs for executing a transactions to avoid out-of-gas exceptions, and the latter is to modify existing contracts to save transaction fee. We target some problems that previous work did not solve: gas estimation for loop functions, and gas optimization for storage usage and arrays. We expect that my research can help Ethereum users avoid economical loss for out-of-gas exceptions and pay less transaction fee.
Pre-printDoctoral Symposium
Mon 15 Nov 2021 20:30 - 20:45 at Wombat - DS Session 4Smart phones and mobile apps have become an essential part of our daily lives. It is necessary to ensure the quality of these apps. Two important aspects of code quality are maintainability and security. The goals of my PhD project are (1) to study code smells, security issues and their evolution in iOS apps and frameworks, (2) to enhance training and teaching using visualisation support, and (3) to support developers in automat- ically detecting dependencies to vulnerable library elements in their apps. For each of the three tools, dedicated tool support will be provided, i.e., GraphifyEvolution, VisualiseEvolution, and DependencyEvolution respectively. The tool GraphifyEvolution exists and has been applied to analyse code smells in iOS apps written in Swift. The tool has a modular architecture and can be extended to add support for additional languages and external analysis tools. In the remaining two years of my PhD studies, I will complete the other two tools and apply them in case studies with developers in industry as well as in university teaching.
Pre-printDoctoral Symposium
Mon 15 Nov 2021 20:45 - 21:00 at Wombat - DS Session 4The genuine supervision of modern IT systems brings new opportunities and challenges by making available big data streams that, if properly analysed, can support high standards of scalability, reliability and efficiency. Rule-based inference engines on streaming data are a key component of maintenance systems for detecting anomalies and automatizing their resolution, but they remain confined to simple and general rules, a lesson learned from the expert systems era. Artificial Intelligence for Operations Systems (AIOps) propose to take advantage of advanced analytics, such as machine learning and data mining on big data, to improve every step of supervision systems, such as incident management (detection, triage, root cause analysis, automated healing). However, the best AIOPs techniques often rely on ``opaque'' models, strongly limiting their adoption. In this thesis, we aim at studying how Subgroup Discovery can help AIOps. This data mining offers possibilities to extract hypotheses from data, resp. from predictive models, helping the experts to understand the underlying processes generating the data, resp. predictions. To ensure relevancy of our propositions, this project involves both data mining researchers and practitioners from Infologic, a French software editor.
Pre-printDoctoral Symposium
Mon 15 Nov 2021 21:00 - 21:15 at Wombat - DS Session 4Despite microservices and other component-based architecture styles being state of the art in research for many years by now, issue management across the boundaries of a single component is still challenging. Components that were developed independently and can be used independently are joined together in the overall architecture, which results in dependencies between those components. Due to these dependencies, bugs can result that propagate along the call chains through the architecture. Other types of issues, such as the violation of non-functional quality properties, can also impact other components. However, traditional issue management systems end at the boundaries of a component, making tracking of issues across different components time-consuming and error-prone. Therefore, a need for automation arises for cross-component issue management, which automatically puts issues of the independent components in the correct mutual context, creating new cross-component issues and syncing cross-component issues between different components. This automation could enable developers to manage issues across components as efficiently as possible and increases the system’s quality. To solve this problem, we propose an initial approach for semi-automated cross-component issue management in conjunction with service-level objectives based on our Gropius system. For example, relationships between issues of the same or different components can be predicted using classification to identify dependencies of issues across component boundaries. In addition, we are developing a system to model, monitor and alert service-level objectives. Based on this, the impact of such quality violations on the overall system and the business process will be analysed and explained through cross-component issues.
File AttachedFor a CV, please take a look at https://www.linkedin.com/in/sandro-speth/.
Doctoral Symposium
Mon 15 Nov 2021 21:15 - 21:30 at Wombat - DS Session 4Non-deterministically behaving tests impede software development as they hamper regression testing, destroy trust, and waste resources. This phenomenon, also called test flakiness, has received increasing attention over the past years. The multitude of both peer-reviewed literature and online blog articles touching the issue illustrates that flaky tests are deemed both a relevant research topic and a serious problem in everyday business. A major shortcoming of existing work aiming to mitigate flaky tests is its limited applicability since many of the proposed tools are highly relying on specific ecosystems. This issue also reflects on various attempts to investigate flaky tests: Using mostly similar sets of open-source Java projects, many studies are unable to generalize their findings to projects laying beyond this scope. On top of that, a holistic understanding of flaky tests also suffers from a lack of analyses focusing on the developers’ perspective with most existing studies taking a code-centric approach. With my work, I want to close these gaps: I plan to create an overarching and widely applicable framework that empowers developers to tackle flaky tests through existing and novel techniques and enables researchers to swiftly deploy and evaluate new approaches. As a starting point, I am studying test flakiness from previously unconsidered angles: I widen the scope of observation investigating flakiness beyond the realm of the Java ecosystem while also capturing the practitioners’ opinion. By adding to the understanding of the phenomenon I not only hope to close existing research gaps, but to retrieve a clear vision of how research on test flakiness can create value for developers working in the field.
File Attachedno description available
Automatically constructing a program based on desired specifications has been studied for decades. Despite the advances in the field of Program Synthesis, the current approaches still synthesize a secluded code snippet and leave the task of reusing it in an existing code base to program developers. Due to its program-wide effects, synthesizing an architectural tactic and reusing it in a program is even more challenging. Architectural tactics need to be synthesized based on the context of different locations of the program, broken down to smaller pieces, and added to corresponding locations in the code. Moreover, each piece needs to establish correct data- and control-dependencies to its surrounding environment as well as to the other synthesized pieces. This is an error-prone and challenging task, especially for novice program developers.
In this paper, we introduce a novel program synthesis approach that synthesizes architectural tactics and adds them to an existing code base.
The limitation of the current program synthesis method is that the synthesized program is small in scale and simple in logic. In this work, we introduce an effective program synthesis approach based on algorithm pseudocode. By parsing the pseudocode, critical information like control structure framework and variable names can be obtained which are used to guide the process of synthesis. Experiments show that information extracted from pseudocode helps to reduce the space of programs and enhance the ability of the synthesizer. It can synthesize some complex programs with control structures.
Selecting which libraries (‘dependencies’ or ‘packages’ in the industry’s jargon) to adopt in a project is an essential task in software development. The quality of the corresponding source code is a key factor behind this selection (from security to timeliness). Yet, how easy is it to find the ‘actual’ source? How reliable is this information? To address this problem, I developed an approach called py2src to automatically identify GitHub source code repositories corresponding to packages in PyPI and automatically provide an indicator of the reliability of such information. I also report a preliminary empirical evaluation.
Pre-printI am a Ph.D. student at the University of Trento. My research interests are software supply chain security and malware detection.
no description available