Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Fault-susceptibility Mitigation and Efficient Use of Resources in Programmable Hardware Accelerators

Abstract

Faced with the exponential growth in computing requirements, programmable hardware accelerators, such as GPUs and FPGAs, are becoming increasingly popular in high performance computing systems. In deference to energy efficiency and scalability challenges in these systems, it is crucial to efficiently use hardware resources while maintaining their reliability requirements. To meet system reliability requirements, traditional methods add redundancy in hardware or software. However, these redundancy-based error mitigation techniques suffer from inefficient use of hardware resources. The goal in this dissertation is to devise low-overhead approaches to mitigate the fault-susceptibility of hardware accelerators, and use their available resources efficiently.

For fault-susceptibility mitigation in GPU accelerators, this dissertation proposes a software-based approach that enables isolation of faulty components through task migration. Due to lack of configurable scheduler for GPUs, the proposed solution makes use of introspective kernels to enable effective task migration for isolating faulty components. This technique has very low overhead in terms of performance and energy and improves the accelerator lifetime and overall system cost. For FPGA accelerators, faulty component isolation is handled with a directive-based method through the synthesis tool.

This dissertation presents practical optimization methods to efficiently use the available resources on programmable hardware accelerators. These optimizations are performed at different levels of abstractions that are useful for GPUs and FPGAs, and the trade-offs among them are elaborated. For GPUs, optimization opportunities are explored in hardware-level and source-level. For FPGAs, optimizations are studied at the compiler-level, source-level, and algorithm-level. These optimization methods seek to remove unnecessary redundancies from program or hardware. This dissertation demonstrates practical and efficient approaches for utilizing fault-susceptible programmable hardware accelerators and improving their efficiency in terms of both cost per performance and energy.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View