Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Distributed Evaluation of Batches of Iterative Graph Queries

Abstract

Graph analytics has been widely used for analyzing large-scale networks using ever-growing graphs to represent the relationships and entities. The real-world graphs are often large and constantly evolving over time. Former requires parallel distributed processing across multiple multicore machines; latter requires repetitive processing over different snapshots of the graphs over time. The combined memories of multiple machines are able to hold large graphs and the large number of cores made available by multiple machines enhance the degree of parallelism delivering scalability. A significant drawback of distributed platforms is that they impose long latency operations due to their high communication latency between machines in the cluster; especially when the computation load of a graph query is small (e.g., finding the shortest path in a graph).

This research first introduces the MultiLyra distributed batching system that amortizes the communication and computation costs across multiple queries by simultaneously evaluating batches of hundreds of iterative graph queries improving the throughput while at the same time maintaining the scalability of distributed processing. Via use of a unified frontier for all queries in a batch, the overhead of distributed evaluation is amortized across queries. MultiLyra yields maximum speedups over the single query baseline ranging from 3.08x to 5.55x across different batch sizes, algorithms and input graphs which are then improved to speedups range from 7.35 to 11.86 by employing a fine-grained query tracking technique and value reuse optimization.

Second, it introduces the ExpressWay technique for faster convergence based on differential treatment of important edges in the graph to further improve the efficiency of the batching system. Each machine in the cluster loads a small portion of the edges from the graph, i.e., the important edges contributing the most in delivering the final results to the vertices, and run the graph queries independently on these edges avoiding the inter-machine communication cost before starts evaluating on the distributed full graph. Expressway benefits MultiLyra by giving additional speedups of up to 4.08x over the MultiLyra baseline for a batch size of 10 queries.

Finally, we present the BEAD system that expands the applicability of batching to evolving analytics demands when both graph and the batch of queries are allowed to grow over time. To maintain the scalability in this case, an incremental evaluation technique will be used to take advantage of the existing result of evaluation a batch of queries on the previous version of the graph to accelerate the evaluation of the batch of queries on the current version of the graph. BEAD outperforms MultiLyra's batched evaluation by factors of up to 26.16x when the graph evolves and 5.66 when the batch of queries is updated.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View