Temporal Memoization for Energy-Efficient Timing Error Recovery in GPGPU Architectures
Skip to main content
eScholarship
Open Access Publications from the University of California

Temporal Memoization for Energy-Efficient Timing Error Recovery in GPGPU Architectures

Abstract

Manufacturing and environmental variability lead to timing errors in computing systems that are typically corrected by error detection and correction mechanisms at the circuit level. The cost and speed of recovery can be improved by memoization-based optimization methods that exploit spatial or temporal parallelisms in suitable computing fabrics such as general-purpose graphics processing units (GPGPUs). We propose here a temporal memoization technique for use in floating-point units (FPUs) in GPGPUs that uses value locality inside data-parallel programs. The technique recalls (memorizes) the context of error-free execution of an instruction on a FPU. Therefore, it avoids redundant execution and saves energy for FPU. To enable scalable and independent recovery, a single-cycle lookup table (LUT) is tightly coupled to every FPU to maintain contexts of recent error-free executions. The LUT reuses these memorized contexts to exactly, or approximately, correct errant FP instructions based on application needs. In real-world applications, the temporal memoization technique achieves an average energy saving of 13%{25% for a wide range of timing error rates (0%{4%) and outperforms recent advances in resilient architectures. This technique also enhances robustness in the voltage overscaling regime and achieves relative average energy saving of 44% with 11% voltage overscaling.

Pre-2018 CSE ID: CS2014-1006

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View