Unsupervised Summarization of Privacy Concerns in Mobile Application Reviews
The proliferation of mobile applications (app) over the past decade has imposed unprecedented challenges on end-users privacy. Apps constantly demand access to sensitive user information in exchange for more personalized services. These -mostly unjustified- data collection tactics have raised major concerns among mobile app users. These concerns are commonly expressed in mobile app reviews. However, privacy concerns are typically overshadowed by more generic categories of user feedback, often related to app reliability and usability. This makes extracting these concerns manually, or even using automated methods, a challenging task. To address these challenges, in this paper, we propose an effective unsupervised approach for summarizing privacy concerns in mobile app reviews. Our analysis is conducted using a dataset of 2.6 million app reviews sampled from three different application domains. The results show that users in different application domains express their privacy concerns using different vocabulary. This domain knowledge can be leveraged to help unsupervised automated text summarization algorithms to effectively generate concise summaries of privacy concerns in review collections. Our analysis in this paper is intended to help app developers to quickly and accurately identify the most critical privacy concerns in their domain of operation.