Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Learning from new perspectives: Using sparse data and multiple views to predict cancer progression and treatment

Abstract

Advancements in sequencing technology have led to an influx of cancer genomics data, transforming cancer research into a field limited by data interpretation rather than acquisition. Machine learning methods that can make use of this wealth of data are desperately needed. Similarly, patient stratification is a critical task in cancer diagnosis and treatment. While stratification approaches using various biomarkers for patient-to-patient comparisons have been successful in elucidating previously unseen subtypes, the potential of many other sparse but rich genotype and phenotype data (e.g. tumor images) remains untapped.

To this end, I present two methods. The first uses social network analysis techniques to extract subtypes from sparse data. The second is a semi-supervised multiview learning framework that integrates both prior knowledge and a variety of genomic data to predict outcomes in cancer. Crucially, this method accommodates samples for which we have different data types, paving the way for integration of data from past studies.

I apply these methods to several cancer datasets. Of note, I show that TCGA--defined molecular subtypes of glioblastoma are independent of both tumor location and volume, and that both the imaging and genomic data provide important perspectives of the disease. Analysis of a large drug sensitivity database identifies an epigenetic effect from chromatin modifiers that lends sensitivity to Panobinostat. Multiview learning, the second method I developed, also outperforms other methods in predicting sensitivity in all of the study drugs. In this dissertation I begin with unsupervised single--platform analysis, then combine multiple platforms, and finally analyze many data platforms using semi--supervised analysis.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View