Potapov, Alexander

A Statistical Investigation of Model Quality in Generative Systems

2018

Potapov, Alexander
Advisor(s): Kreutz-Delgado, Kenneth

Abstract

Machine Learning is a powerful tool for both processing and generating data. It has been demonstrated to be more efficient than humans at distinguishing hundreds of different types of images. Such classification and game-based metrics are easily quantifiable, making it simple to demonstrate improvements in efficiency and quality over previous iterations. In contrast, Generative Models (GMs) that create synthetic samples by simulating a distribution are much harder to evaluate cleanly. When looking at images, an approach known as Maximum Mean Discrepancy (MMD) has recently become quite popular, as it can non-parametrically compare samples and assign a similarity score between their relative distributions. A major flaw that MMD has is that there is no accepted approach for defining a score that would deem the generated images as similar enough to real images for them to pass an independent analysis. Introducing humans has been a stopgap solution, but this adds a subjective element to the process workflow. An ideal solution would wholly remove the human element and determine an appropriate MMD-based quality target value using solely the data provided. In this thesis, we present a solution for this situation, as we introduce a novel statistical test that can more accurately compare distributions of data and determine a target score using solely the sample sets. By inspecting the quality of this solution on various models, we can train and analyze models that perform at sufficiently good levels via a fully automated procedure.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

A Statistical Investigation of Model Quality in Generative Systems