Xue, Fanghui

Relaxation and Optimization for Automated Learning of Neural Network Architectures

2022

Xue, Fanghui
Advisor(s): Xin, Jack

Creative Commons 'BY' version 4.0 license

Abstract

Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on bilevel optimization. Despite its success in many architecture search tasks, there are still some concerns about the accuracy of the first-order DARTS and the efficiency of the second-order DARTS. In this article, we formulate a single level alternative and a relaxed architecture search (RARTS) method that utilizes the whole dataset in architecture learning via both data and network splitting, without involving mixed second derivatives of the corresponding loss functions like DARTS. The advantage of RARTS over DARTS is justified by a convergence theorem and an analytically solvable model. Moreover, RARTS outperforms DARTS and its variants in accuracy and search efficiency, as shown in adequate experiments on CIFAR-10 and ImageNet image classification datasets, and public architecture search benchmark like NATS-Bench.

Since network pruning is closely related to architecture search in the form of width and depth search, we have also adapted RARTS to width search and summarized it as a general framework. Experiments show that our method beats the previous benchmarks in PreResNet-164 pruning on CIFAR datasets. Additionally, it has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we extend our method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20% or 40% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Finally, we have also provided the complexity analysis and comparisons with the previous pruning methods.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Relaxation and Optimization for Automated Learning of Neural Network Architectures