Yang, Biao

Training Efficient Neural Network Models via Pruning and Knowledge Distillation

2022

Yang, Biao
Advisor(s): Xin, Jack

Abstract

A relaxed group-wise splitting method (RGSM) is developed and evaluated for channel pruning of deep neural networks. Experiments with VGG-16 and ResNet-18 architectures on CIFAR-10/100 image data show that RGSM can achieve much higher channel sparsity than group Lasso method, while keeping comparable accuracy.

Multi-resolution paths and multi-scale feature representation are key elements of semantic segmentation networks. We develop two techniques for efficient networks based on the recent FasterSeg network architecture. One is to use a state-of-the-art high resolution network (e.g. HRNet) as a teacher to distill a light weight student network. Due to dissimilar structures in the teacher and student networks, distillation is not effective to be carried out directly in a standard way. To solve this problem, we introduce a tutor network with an added high resolution path to help distill a student network which improves FasterSeg student while maintaining its parameter/FLOPs counts. The other finding is to replace standard bilinear interpolation in the upscaling module of FasterSeg student net by a depth-wise separable convolution and a Pixel Shuffle module which leads to 1.9% (1.4%) mIoU improvements on low (high) input image sizes without increasing model size.

A Fast Feature Affinity loss is developed for intermediate feature knowledge distillation. It requires less computational cost as well as storage cost. Experiments with modified EfficientNet architectures on CIFAR-100 data show that both Feature Affinity loss and Fast Feature Affinity loss improve the accuracy of the network and have close performance.

A compact DETR based architecture is proposed for human-only detection. By replacing the backbone of DETR with MobileNet-V3 and shrink the decoder layer, we first obtain a baseline model. Then we replace the transformer encoder with convolutional encoder. And experiments show that convolutional based encoders have better performance, but lessFLOPs and parameters.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Training Efficient Neural Network Models via Pruning and Knowledge Distillation