Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Adaptive AI Algorithms for Generic Hardware and Unified Hardware Acceleration Architecture

Abstract

We are now in an era of the Big Bang of artificial intelligence (AI). In this wave of revolution, both industry and academia have cast numerous funds and resources. Machine learning, especially Deep Learning, has been widely deployed to replace the traditional algorithms in many domains, from the euclidean data domain to the non-euclidean domain. As the complexity and scale of the AI algorithms increase, the system host these algorithms requires more computational power and resources than before. Using the design of the modules of the video analytic platform as the use cases, we analyze the workload cost for computational resource and memory allocation during the execution of the system. The video analytic platform is a complex system that comprises various computer vision and decision-making tasks. Every module accomplishing a specific task is a stage in the pipeline of the video analytic platform. With the analyses mentioned above, we synthesize the adaptive AI algorithms from availability and variability perspectives, such as optimization with tensorization or matricization. We conceive the sparse Transformer and segmented linear Transformer as the critical components for the human action recognition task. The Constraint Satisfaction Problem is employed to assist the decision-making in the scene parsing stage. To facilitate this fulfillment of this task, we designed a hybrid model for graph learning-based SAT solver. Graph matching is employed at the final stage for the scene understanding task. We implemented a hybrid model of GNN and Transformer architecture. Finally, we design the unified hardware acceleration architecture for both dense and sparse data based on the optimizations of algorithms. Our design of the architecture targets the arithmetic operation kernels, such as matrix multiplications, with the help of data transformation and rearrangement. We first transform the inputs and weights with Winograd transform for dense convolution operations, then we feed the transformed data to the matrix multiplication accelerator. While for sparse data, we need to utilize the index to nonzero to fetch data; therefore, the indexation, scattering, and gathering are crucial components, effective implementation will dramatically improve the system's overall performance. To improve the matrix multiplication accelerator's efficiency and reduce the number of heavy arithmetic operations and the number of memory accesses, we also conduct the hardware-based recursive algorithm, i.e., Strassen's algorithm for matrix multiplication.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View