Achille, Alessandro

Emergent Properties of Deep Neural Networks

2019

Achille, Alessandro
Advisor(s): Soatto, Stefano

Abstract

We show that information theoretic quantities can be used to control and describe the training process of Deep Neural Networks, and can explain how properties, such as invariance to nuisance variability and disentanglement of semantic factors, emerge naturally in the learned representation. Through its dynamics, stochastic gradient descent (SGD) implicitly regularizes the information in the weights, which can then be used to bound the generalization error through the PAC-Bayes bound. Moreover, the information in the weights can be used to defined both a topology and an asymmetric distance in the space of tasks, which can then be used to predict the training time and the performance on a new task given a solution to a pre-training task. While this information distance models difficulty of transfer in first approximation, we show the existence of non-trivial irreversible dynamics during the initial transient phase of convergence when the network is acquiring information, which makes the approximation fail. This is closely related to critical learning periods in biology, and suggests that studying the initial convergence transient can yield important insight beyond those that can be gleaned from the well-studied asymptotics.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Emergent Properties of Deep Neural Networks