Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Unsupervised Learning and Understanding of Deep Generative Models

Abstract

Probabilistic generative models, especially ones that are parametrized by convolutional neural network (ConvNet), are compact representation tools towards knowledge understanding and can be crucial in statistics as well as artificial intelligence. The generator model and the energy-based model are two notable examples. Yet the learning and understanding of such models can be challenging because of the high dimensionality of the input and the high non-linearity of the network. In this dissertation, we pay particular attention to the generator model, and study its learning algorithm and the behavior of the learned model. We also develop the joint learning scheme for both the generator model and the energy-based model.

To learn the generator model, we view it in the lens of non-linear generalization of factor analysis and propose an alternating back-propagation algorithm for learning. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.

The generator model can be naturally extended for multi-view representation learning where we build separate generator model for each domain but share their latent variables. The proposed multi-view generator model can be easily learned through alternating back-propagation. Our experiments show that the proposed method is powerful in both generation, prediction and recognition. Specifically, we demonstrate our model can accurately rotate and complete faces as well as predict missing modalities. We also show our model can achieve state-of-art or competitive recognition performance through quantitative comparisons.

Further, the generator model can be jointly learned with the energy-based model. We propose the probabilistic framework, called divergence triangle, as a compact and symmetric (anti-symmetric) objective function that seamlessly integrates variational learning, adversarial learning, wake-sleep algorithm, and contrastive divergence. This unification makes the processes of sampling, inference, energy evaluation readily available without the need for costly Markov chain Monte Carlo methods. Our experiments demonstrate that the divergence triangle is capable of learning (1) an energy-based model with well-formed energy landscape, (2) direct sampling in the form of a generator model, and (3) feed-forward inference that faithfully reconstructs observed as well as synthesized data. The divergence triangle is also a robust training method that can learn from incomplete data.

The last but not the least, we take the inspiration from recent discovery in neuroscience which states that for the face stimuli generated by a pre-trained active appearance model (AAM), the responses of neurons in the selected areas of the primate brain exhibit strong linear relationship with the shape and appearance variables of the AAM that generates the face stimuli. We show that this behavior can be replicated by a generator model. Specifically, we learn the generator model from the face images generated by a pre-trained AAM model using variational auto-encoder, and we show that the inferred latent variables of the learned generator model have strong linear relationship with the shape and appearance variables of the AAM model that generates the face images. Unlike the AAM model that has an explicit shape model where the shape variables generate the landmarks, the generator model has no such shape model and shape variables. Yet the generator model can learn the shape knowledge in the sense that some of the latent variables of the learned generator network capture the shape variations in the face images generated by AAM.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View