Manufacturing and environmental variability lead to timing errors in
computing systems that are typically corrected by error detection and
correction mechanisms at the circuit level. The cost and speed of recovery can
be improved by memoization-based optimization methods that exploit spatial or
temporal parallelisms in suitable computing fabrics such as general-purpose
graphics processing units (GPGPUs). We propose here a temporal memoization
technique for use in floating-point units (FPUs) in GPGPUs that uses value
locality inside data-parallel programs. The technique recalls (memorizes) the
context of error-free execution of an instruction on a FPU. Therefore, it
avoids redundant execution and saves energy for FPU. To enable scalable and
independent recovery, a single-cycle lookup table (LUT) is tightly coupled to
every FPU to maintain contexts of recent error-free executions. The LUT reuses
these memorized contexts to exactly, or approximately, correct errant FP
instructions based on application needs. In real-world applications, the
temporal memoization technique achieves an average energy saving of 13%{25% for
a wide range of timing error rates (0%{4%) and outperforms recent advances in
resilient architectures. This technique also enhances robustness in the voltage
overscaling regime and achieves relative average energy saving of 44% with 11%
voltage overscaling.
Pre-2018 CSE ID: CS2014-1006