Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Towards Predictable and Dependable Real-Time DNN Inference

No data is associated with this publication.
Abstract

With the rising integration of deep neural networks (DNNs) in real-time safety-critical systems, much attention has been given to ensuring predictable and dependable execution of DNN inference workloads. The two main requirements for such DNN inference executions are: predictable real-time performance and dependability of output against malicious attacks. Although much research has been conducted to optimize the structure of DNNs algorithmically, limited attention has been given to support predictable and dependable DNN workloads from the systems perspective, especially on the scheduling of real-time inference requests to various DNN models while under malicious attacks. Without such support, DNN executions can be particularly problematic when used in safety-critical systems, as response time may become unpredictable and results might be compromised when under attack. However, late or compromised outputs may lead to catastrophic consequences in such systems. Therefore, it is an emerging challenge to provide both timely-predictable and output-dependable DNN executions.

This dissertation proposes systems and algorithmic solutions to the DNN execution problem mentioned above. We first present DART, a DNN scheduling framework with Analyzable Real-time guarantee, to provide real-time performance of DNN inference executions. DART offers deterministic response time to real-time tasks and increases throughput to best-effort tasks by employing a novel pipelined-based scheduling architecture with data parallelism. We then present AegisDNN, a DNN inference framework for timely and dependable execution with secure SGX enclaves, to address the reliability concern. It smartly offloads only the critical subset of real-time DNN tasks to the protected enclave and the error-resilient workloads to the GPU by applying a dynamic-programming-based algorithm which evaluates the protection overhead versus the protection effectiveness of each DNN layer, while maintaining the dependability requirement and the schedulability of the taskset. Our design of DART and AegisDNN is applicable to various DNN backends. We evaluated their practicality and effectiveness using real platforms. To further improve the real-time performance of AegisDNN, we also present a design extension called AegisDNN++. It is specially designed to better utilize heterogeneous computing resources. AegisDNN++ syntheses the novel scheduling architecture of DART and provides updated algorithms. It also provides future systems directions for our problem. The contributions of this dissertation pave the road for designing systems with dependability and real-time predictability.

Main Content

This item is under embargo until July 20, 2024.