Mahajan, Gaurav

Computational and Statistical Complexity of Learning in Sequential Models

2023

Abstract

Recent success of machine learning is driven by scaling laws: larger architectures trained using more data and compute lead to more ``intelligent'' agents. Therefore, even minor enhancements to the sample and compute complexity of these algorithms can have significant scientific and financial implications. In this dissertation, we study these question in the context of sequential models. In particular, we study the following questions: (1) Computational-statistical gaps in reinforcement learning. In this part, we study the computational and statistical complexity of sequential decision-making under the framework of reinforcement learning. A fundamental assumption in theory of reinforcement learning is "RL with linear function approximation". Under this assumption, the optimal value function (either Q*, or V*, or both) can be obtained as the linear combination of finitely many known basis functions. Even though it was observed as early as 1963 that there are empirical benefits of using linear function approximation, only recently a series of work designed sample efficient algorithms for this setting. These works posed an important open problem: \emph{Can we design polynomial time algorithms for this setting?} Here, we show progress on this open problem by proving: unless NP=RP, no polynomial time algorithm exists for this settings.

(2) Computationally efficient algorithms for learning HMMs. In this part, we study the computational complexity of learning structured distributions over sequences of observations (DNA sequences, proteins, spoken words and so on). In particular, we are concerned with the computational complexity of learning Hidden Markov Model (HMM). Although HMMs are some of the most widely used tools in sequential and time series modeling, they are cryptographically hard to learn in the standard setting where one has access to i.i.d. samples of observation sequences. Here, we show a positive result: computationally efficient algorithm for learning HMMs when the learner has access to conditional samples from the target distribution. We also show that these results extend to ``low rank'' distributions. (3) Understanding policy gradient methods in reinforcement learning. In this part, we study the most commonly used algorithms for sequence decision-making in practice: policy gradient methods. Even though these algorithms are simple to implement, their convergence properties are only established at a relatively coarse level; in particular, the folklore guarantee is that these methods converge to a stationary point of the objective. Here, we present the first global convergence results for policy gradient methods like vanilla policy gradient (w/wo regularization) and natural policy gradient.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Computational and Statistical Complexity of Learning in Sequential Models