Thu 24 Sep 2020 10:40 - 10:45 at Wombat - Tool Demo Showcase (3) Chair(s): Csaba Nagy
In this paper, we present Sosed, a tool for discovering similar software projects. We use fastText to compute the embeddings of sub-tokens into a dense space for 120,000 GitHub repositories in 200 languages. Then, we cluster embeddings to identify groups of semantically similar sub-tokens that reflect topics in source code. We use a dataset of 9 million GitHub projects as a reference search base. To identify similar projects, we compare the distributions of clusters among their sub-tokens. The tool receives an arbitrary project as input, extracts sub-tokens in 16 most popular programming languages, computes cluster distribution, and finds projects with the closest distribution in the search base. We labeled sub-token clusters with short descriptions to enable Sosed to produce interpretable output.
Sosed is available at https://github.com/JetBrains-Research/sosed/. The tool demo is available at https://www.youtube.com/watch?v=LYLkztCGRt8. The multi-language extractor of sub-tokens is available separately at https://github.com/JetBrains-Research/identifiers-extractor/.
Tue 22 Sep Times are displayed in time zone: (UTC) Coordinated Universal Time
16:00 - 17:00: Maintenance and Evolution (3)Research Papers / Tool Demonstrations at Koala Chair(s): Yongjie ZhengCalifornia State University San Marcos | |||
16:00 - 16:20 Talk | Subdomain-Based Generality-Aware Debloating Research Papers Qi XinGeorgia Institute of Technology, Myeongsoo KimGeorgia Institute of Technology, Qirun ZhangGeorgia Institute of Technology, USA, Alessandro OrsoGeorgia Tech | ||
16:20 - 16:40 Talk | Revisiting the relationship between fault detection, test adequacy criteria, and test set size. Research Papers Yiqun ChenUniversity of Washington, Rahul GopinathCISPA Helmholtz Center for Information Security, Anita TadakamallaGeorge Mason University, USA, Michael D. ErnstUniversity of Washington, USA, Reid HolmesUniversity of British Columbia, Gordon FraserUniversity of Passau, Paul AmmannGeorge Mason University, USA, René JustUniversity of Washington, USA | ||
16:40 - 16:50 Talk | WASim: Understanding WebAssembly Applications through Classification Tool Demonstrations | ||
16:50 - 17:00 Talk | Sosed: a tool for finding similar software projects Tool Demonstrations Egor BogomolovJetBrains Research, Yaroslav GolubevJetBrains Research, ITMO University, Artyom LobanovJetBrains Research, Vladimir KovalenkoJetBrains Research, JetBrains N.V., Timofey BryksinJetBrains Research, Saint Petersburg State University |
Thu 24 Sep Times are displayed in time zone: (UTC) Coordinated Universal Time
10:20 - 11:20: Tool Demo Showcase (3)Tool Demonstrations at Wombat Chair(s): Csaba NagySoftware Institute - USI, Lugano, Switzerland | |||
10:20 - 10:25 Talk | FILO: FIx-LOcus Localization for Backward Incompatibilities Caused by Android Framework Upgrades Tool Demonstrations Marco MobilioUniversity of Milano Bicocca, Oliviero RiganelliUniversity of Milano-Bicocca, Italy, Daniela MicucciUniversity of Milano-Bicocca, Italy, Leonardo MarianiUniversity of Milano Bicocca | ||
10:25 - 10:30 Talk | EXPRESS: An Energy-Efficient and Secure Framework for Mobile Edge Computing and Blockchain based Smart Systems Tool Demonstrations | ||
10:30 - 10:35 Talk | SmartBugs: A Framework to Analyze Solidity Smart Contracts Tool Demonstrations João F. FerreiraINESC-ID and IST, University of Lisbon, Pedro CruzIST, University of Lisbon, Portugal, Thomas DurieuxKTH Royal Institute of Technology, Sweden, Rui AbreuFaculty of Engineering, University of Porto, Portugal | ||
10:35 - 10:40 Talk | RepoSkillMiner: Identifying software expertise from GitHub repositories using Natural Language Processing Tool Demonstrations Efstratios KourtzanidisUniversity Of Macedonia, Alexander ChatzigeorgiouUniversity of Macedonia, Apostolos AmpatzoglouUniversity of Macedonia Pre-print Media Attached File Attached | ||
10:40 - 10:45 Talk | Sosed: a tool for finding similar software projects Tool Demonstrations Egor BogomolovJetBrains Research, Yaroslav GolubevJetBrains Research, ITMO University, Artyom LobanovJetBrains Research, Vladimir KovalenkoJetBrains Research, JetBrains N.V., Timofey BryksinJetBrains Research, Saint Petersburg State University | ||
10:45 - 10:50 Talk | GUI2WiRe: Rapid Wireframing with a Mined and Large-Scale GUI Repository using Natural Language Requirements Tool Demonstrations Kristian KolthoffInstitute for Enterprise Systems (InES), University Of Mannheim, Christian BarteltInstitute for Software and Systems Engineering, TU Clausthal, Simone Paolo PonzettoData and Web Science Group, University of Mannheim | ||
10:50 - 11:20 Live Q&A | Q&A or Discussion Tool Demonstrations |