Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Analysis of Dictionary Learning and Random Forest under Data-inspired Models

No data is associated with this publication.
Abstract

Many algorithms in Machine Learning have demonstrated powerful empirical performance in terms of prediction accuracy and ability to extract knowledge from data. Inspired by the empirical success, researchers study the behavior of those algorithms under theoretical models. Two challenges must be addressed when one {tries} to understand those algorithms from a theoretical perspective. First, a sound theoretical model should be considered. A good model should reflect key properties of the real data. If the models do not capture important aspects of the real data, the observations and/or conclusions made on those theoretical models {may not be relevant} to practice. Thus, theoretical models that reflect certain properties of real data are required so that the insights obtained from those models are convincing to practitioners. Second, many of these algorithms are hard to analyze via traditional techniques. Therefore, novel techniques are required to analyze those algorithms under new theoretical models. In this thesis, we analyze two problems under novel real-data inspired theoretical models: $\ell_1$-minimization for dictionary learning and seeking important features and feature interactions from {Random Forest} (RF). Both analyses give unique insights into the problem by studying a novel data generative model. For dictionary learning, we propose two novel theoretical models: exact sparse model and Bernoulli-type models. Unlike most previous {analyses} that assume the data is {generated from} Gaussian distributions with sparsity constraints, these new models can capture non-Gaussian data distributions and allow us to analyze the algorithms under novel data properties such as non-negativity and heavy-tail. We show that $\ell_1$-minimization model in Dictionary Learning does not satisfy the classic global identifiability condition under the new model. However, the reference dictionary still enjoys some global property across all the feasible dictionaries. Our theoretical analysis leads to a novel algorithm called Dictionary Learning Block Coordinate Descent (DL-BCD). For RF, we start off with analyzing the feature importance bias for noisy features when using Mean Decrease Impurity (MDI). Then, we study the feature interaction recovery problem and analyzed the data-inspired Local-Spiky Sparse (LSS) model without Lipschitz assumptions that are often present in the previous literature. We show that the depth-weighted-prevalence of a true feature interaction in the decision paths of trees does not depend on the model coefficients but only on the size of the interaction. The theoretical analysis leads to a novel feature ranking method called LSSrank. We examine the performance of LSSrank on simulated data and it has high probability to rank true interactions at the top under the LSS model.

Main Content

This item is under embargo until February 16, 2025.