Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

A Data Science Approach for Real-Time HIV-Risk Analysis on Twitter

Abstract

HIV is still a major epidemic and although significant progress in treatment has been achieved, a functional cure for HIV is still far away, and a great deal of effort is currently focused on the prevention of HIV. Prevalence of HIV has recently prompted clinicians and public health officials to take a look at social media as the source of digital epidemiology. This thesis introduces our data science approach aimed at capturing HIV-related trends based on multidimensional data from Twitter. We show how our platform can help clinicians understand

people’s risk behavior, and ultimately guide in HIV prevention. Our design is flexible and extensible, and currently employs a collection of techniques that span crowd-sourcing, natural language processing, image classification, supervised machine learning, and graph data analysis to classify at-risk tweets and user groups. In our experiments, we have established the relationship between an individual user’s risk along with the network’s risk for HIV based on their actions on Twitter. This infrastructure will serve as a foundation for building visualizations and real-time analytical tools for studying the prevalence of HIV-risk to better inform prevention resources.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View