Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Data-Driven Optimization to Learn Structural Models

Abstract

The rapid accumulation of high-dimensional data has opened new opportunities to make informed decisions. In this thesis, we focus on estimation of structural models from observational data using optimization and statistics to understand the effects of strategic decisions. We develop efficient procedures that blend techniques from economic modeling and machine learning to uncover underlying models efficiently and accurately.

In Chapter 2, we focus on understanding the effect of performance-based incentives on worker performance using historical contract data. The design of performance-based incentives can be naturally posed as a moral hazard principal-agent problem. In this setting, a key input to the principal’s optimal contracting problem is the agent’s production function – the dependence of agent output on effort. While agent production is classically assumed to be known to the principal, this is unlikely to be the case in practice. Motivated by the design of performance-based incentives, we present a method for estimating a principal-agent model from data on incentive contracts and associated outcomes, with a focus on estimating agent production. The proposed estimator is statistically consistent and can be expressed as a mathematical program. To circumvent computational challenges with solving the estimation problem exactly, we approximate it as an integer program, which we solve through a column generation algorithm that uses hypothesis tests to select variables. We show that our approximation scheme and solution technique both preserve the estimator’s consistency and combine to dramatically reduce the computational time required to obtain sound estimates. To demonstrate our method, we conducted an experiment on a crowdwork platform (Amazon Mechanical Turk) by randomly assigning incentive contracts with varying pay rates among a pool of workers completing the same task. We present numerical results illustrating how our estimator combined with experimentation can shed light on the efficacy of performance-based incentives.

In Chapter 3, we focus on learning causal structures from observational data, a process known as causal discovery. We propose a new optimization-based method for causal discovery. Our method takes as input observational data over a set of variables and returns a graph in which causal relations are specified by directed edges. We consider a highly general search space that accommodates latent confounders and feedback cycles, which few extant methods do. We formulate the discovery problem as an integer program, and propose a solution technique that leverages the conditional independence structure in the data to identify promising edges for inclusion in the output graph. Our method is among the very first to bring integer programming to general causal discovery, which we believe is one of our main contributions. In the large-sample limit, our method recovers a graph that is equivalent to the true data-generating graph. Computationally, our method is competitive with the state-of-the-art, and can solve in minutes instances that are intractable for alternative causal discovery methods. We then extend our framework to a priori identify a subset of variables that collectively carry all useful information about the variable of interest. This way, we can sidestep the computational burden of learning causal relations among variables of secondary importance.

In Chapter 4, we focus on investigating the validity of instrumental variables, which are widely used to estimate causal effects in the presence of unmeasured confounding. In particular, we apply our method developed in Chapter 3 to US Census data from the seminal paper on the returns to education by (Angrist and Krueger, 1991), which contains a pioneering application of an instrumental variable, but one whose validity has been contested. We find that the causal structures uncovered by our method are consistent with the literature on the instrument from (Angrist and Krueger, 1991), and that our method pinpoints some of the sources of debate. Our results suggest that our graphical approach can be a useful complement to well-established empirical methods.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View