Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Methods for Mining Important User-Generated Contents and Behaviors From Online Platforms

Abstract

How effectively can we extract useful information from online platforms? Our work is motivated by the observation that online platforms, such as security forums, gaming forums, and software archives hide significant and useful information. We argue that mining this information can greatly benefit security analysts as it can reveal trends, patterns of behavior, emerging threats, and even malicious actors. This thesis spans three interrelated problems in this space. First, we address the problem of how we can identify interesting activities in a forum for which we have no prior knowledge. We develop a systematic tensor-based tool to identify “events”, defined in a three-dimensional space of users, threads, and time. A key novelty is that we let the forum “reveal” the events of interest in an unsupervised manner, while we empower the tech-savvy end-users to easily tune some parameters to influence their focus if so desired. Second, we propose a novel method to expand the tensor decomposition approach to reveal a hierarchical structure from the multi-modal data in a self-adaptive way. So far, current tensor decomposition-based algorithms extract a flat clustering from the multi-modal data. We apply our hierarchical method on real data from six online forums, which leads us to many interesting findings which validate the value of our approach. Third, we turn our attention to software archives that seem to harbor significant hacker activity with thousands of publicly available malware repositories. The goal is to understand the collaboration dynamics of the hackers and follow their footprints across forums as well. In our thesis, we use the data from four security forums, one gaming forum, and one software archive with 50K users, 60K threads, 150K posts, 8.5K repositories spanning over 5 years. We show that our approaches are powerful, as they are able to identify: (a) interesting communities of users, (b) meaningful hierarchies of communities and events, and (c) several tight-knit groups of hackers that collaborate on malware projects. For example, we identify some real events like ransomware outbreaks (55 users, 86 threads, December 2015, February 2016), the emergence of a black-market of decryption tools (34 users, 12 threads, February 2016), and romance-enabled scamming (82 users, 172 threads, March 2018). To maximize the impact of our work, we intend to make our tools and our datasets publicly available as a tangible contribution to the research community. In conclusion, we believe that our approaches and tools constitute important steps towards automated capabilities for shifting through the wealth of information in online platforms efficiently and effectively.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View