Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

A BAD Thesis: The Vision, Creation, and Evaluation of a Big Active Data Platform

Abstract

Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users.

Instead, this thesis aims to shift Big Data platforms from passive to active. A Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information.

While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data.

To this end this thesis presents a BAD platform, that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. The platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an \textit{active toolkit}. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user.

This thesis describes the features and designs of the current BAD system and demonstrates its ability to scale without sacrificing query capability or individualization.

One part of the BAD platform, the \emph{Data Feed}, relies on storage mechanisms that allow for fast ingestion, namely the Log-Structured Merge-Tree (LSM-Tree). As such, this thesis also presents work on a formal evaluation and performance comparison of theoretical and existing LSM Merge policies for fast ingestion.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View