Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Parallel and Scalable Architectures for Video Encoding

Abstract

As the latest video compression standard, H.264/AVC exhibits great compresion performance than its previous ancestors. Many new features are used to achieve much better rate-distortion eciency and subjective quality, but the high computational complexity and intensive memory access are the penalties. Such high requirement of memory and computational resources leads to long processing cycles and high power consumption. This made real-time encoding of H.264/AVC hard to implement.

To address these diculties, this thesis is focused on fast algorithm, data reuse and parallel architectures of H.264/AVC encoder. For data reuse, we proposed a partially forward processing algorithm (PFPA) which reuses the reference information to avoid duplicated reference data loading. For fast algorithms, we studied the statistical features of fractional motion estimation (FME) and proposed a FME mode reduction

scheme. For parallel architectures, we proposed two solutions for block level and MB level parallelization respectively. At the block level, we proposed a FME parallel architecture which achieved both memory and processing cycle eciency (reduced about 67%

memory accesses and about 50% processing cycles compared with most of state of the art architectures). At the MB level, we proposed wavefront architecture. Theoretically, this architecture can extend a multi-core encoder to a system with any desired number

of cores without sacricing encoding quality.

Both JM model and Tensilica XTMP are used to verify the proposed architectures. Architecture implementation detail are discussed and cycle-accurate test results show good performance improvements with very small overhead. From dual-core to three-core and quad-core, the overhead of the P-Core performance are 0.8% and 1.3%

for I-frames; 1.7% and 2.4% for P-frames. The speed-ups from dual-core to three-core and quad-core are 1.49 and 1.97 for I-frames; 1.47 and 1.95 for P-frames. System up-scaling methodologies are also covered at the end of this thesis.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View