Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Endogenous Econometric Models and Multi-Stage Estimation in High-Dimensional Settings: Theory and Applications

Abstract

Econometric models based on observational data are often endogenous due to measurement error, autocorrelated errors, simultaneity and omitted variables, non-random sampling, self-selection, etc. Parameter estimates of these models without corrective measures may be inconsistent. The potential high-dimensional feature of these models (where the dimension of the parameters of interests is comparable to or even larger than the sample size) further complicates the statistical estimation and inference. My dissertation studies two different types of high-dimensional endogenous econometrics problems in depth and develops statistical tools together with their theoretical guarantees.

The first essay in this dissertation explores the validity of the two-stage regularized

least squares estimation procedure for sparse linear models in high-dimensional

settings with possibly many endogenous regressors. The second essay is focused on the semiparametric sample selection model in high-dimensional settings under a weak nonparametric restriction on the form of the selection correction, for which a multi-stage

projection-based regularized procedure is proposed. The number of regressors in the

main equation, p, and the number of regressors in the first-stage equation, d, can grow

with and exceed the sample size n in the respective models. The analysis considers the

sparsity case where the number of non-zero components in the vectors of coefficients

is bounded above by some integer which is allowed to grow with n but slowly compared

to n, or the vectors of coefficients can be approximated by exactly sparse vectors.

Simulations are conducted to gain insight on the small-sample performance of these

high-dimensional multi-stage estimators. The proposed estimators in the second essay

are also applied to study the pricing decisions of the gasoline retailers in the Greater

Saint Louis area.

The main theoretical results of both essays are finite-sample bounds from which

sufficient scaling conditions on the sample size for estimation consistency and variable selection consistency (i.e., the multi-stage high-dimensional estimation procedures correctly select the non-zero coefficients in the main equation with high probability) are established. A technical issue regarding the so-called “restricted eigenvalue (RE) condition” for estimation consistency and the “mutual incoherence (MI) condition” for selection consistency arises in these multi-stage estimation procedures from allowing the number of regressors in the main equation to exceed n and this paper provides analysis to verify these RE and MI conditions. In particular, for the semiparametric sample selection model, these verifications also provide a finite-sample guarantee of the population identification condition required by the semiparametric sample selection models.

In the second essay, statistical efficiency of the proposed estimators is studied via

lower bounds on minimax risks and the result shows that, for a family of models with exactly sparse structure on the coefficient vector in the main equation, one of the proposed estimators attains the smallest estimation error up to the (n, d, p)−scaling among a class of procedures in worst-case scenarios. Inference procedures for the coefficients of the main equation, one based on a pivotal Dantzig selector to construct non-asymptotic confidence sets and one based on a post-selection strategy (when perfect or near-perfect selection of the high-dimensional coefficients is achieved), are discussed. Other theoretical contributions of this essay include establishing the non-asymptotic counterpart of the familiar asymptotic “oracle” type of results from previous literature: the estimator of the coefficients in the main equation behaves as if the unknown nonparametric component were known, provided the nonparametric component is sufficiently smooth.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View