With the development of big data, machine learning, and AI, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. This talk will review emerging roles of data scientists and the tools they need to build scalable, correct, and efficient software for a data centric world.
Kim will present a large-scale study of about 800 data scientists in collaboration with Microsoft Research, which looked at data scientists’ educational background, problem topics that they work on, tools they use, and activities. From the gathered data, she has identified nine distinct clusters of data scientists and best practices and challenges faced by each cluster.
In the second half of this talk, she will discuss the needs of re-targeting SE research community’s directions to address new challenges in the era of data-centric software development. In particular, she will detail some examples of her group’s work that re-invents debugging and testing for big data distributed systems such as Apache Spark. She will conclude with open SE problems in ML and heterogeneous computing that support data-centric software development.
Cloud computing has become the typical way to deliver enterprise applications. As today’s cloud infrastructure becomes more and more complex with hybrid cloud as well as AI and advanced data processing integrated in the platform, human errors has become one of the major causes of failures in cloud and Internet systems, as reported by many system vendors and service providers. While various fault tolerance and recovery mechanisms are useful in handling hardware and software failures, they are less effective in handling system administrators’ human errors. The very recent outage in Facebook on March 13th, 2019 was also caused by a server configuration error, affecting millions of users. In addition to reliability, configuration errors also can lead to security issues. OWASP reports misconfiguration as one of the top 10 most critical web security risks. In 2017, a configuration error of Amazon S3 storage exposed personal information of 200 million users. In this talk, I will focus a few current challenges on the human dimension of cloud computing and management. Due to legacy and various other reasons, most today’s data center system management requirement (in particular system configuration) do not follow the primary design principles of human-computer interaction (HCI), namely (i) simplicity, (ii) feedback, and (iii) consistency, making cloud management error prone for system admins.
Producing AAA games takes a lot of effort and organization. The production pipeline used at Ubisoft for its major brands like Rainbow Six, Assassin Creed or Far Cry is in constant evolution to produce bug-free games for our millions of players and support the game as a service (GaaS) paradigm that is currently transforming the video-game industry. This talk will present how we have automated our debug and profiling activities using known techniques from the software as a service world, landmarks of the SE scientific literature and our own research. This talk will also present the problems we are currently tackling, in partnership with our research lab (Ubisoft La Forge), Mozilla and several Canadian universities (Concordia, Polytechnique Montreal, ETS, McGill), to further automate our production pipeline.