Optimization for machine learning: Memory-efficient and tractible solutions to large-scale non-convex systems
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Merced

UC Merced Electronic Theses and Dissertations bannerUC Merced

Optimization for machine learning: Memory-efficient and tractible solutions to large-scale non-convex systems

Creative Commons 'BY-SA' version 4.0 license
Abstract

Neural networks generally require large amounts of data to adequately model thedomain space. In situations where the data are limited, the predictions from these models, which are typically obtained from stochastic gradient descent (SGD) minimization algorithms, can be poor. In addition, the data is commonly corrupted due to poor imaging appatus. In these cases, the use of more sophisticated optimization approaches and model architectures becomes crucial to increase the impact of each training iteration. Second-order methods can capture curvature information, providing a more informed guess on the direction and step length. However, they require vast amounts of storage and can be computationally time demanding. To address the computational issue, we propose an optimization algorithm that uses second-derivative information, exploiting curvature information for avoiding saddle points. We utilize a Hessian-free approach where we do not explicitly store the second-derivative matrix, by applying a conjugate gradient method. The algorithm uses a trust-region method, which does not require the Hessian to be positive definite. We present numerical experiments which demonstrate the improvement in classification accuracy using our proposed approach over a standard SGD approach. We propose using a limited-memory symmetric rank-one quasi-Newton approach which further addresses the time and space computational complexity. The approach allows for indefinite Hessian approximations, enabling directions of negative curvature to be exploited. Furthermore, we use a modified adaptive regularized using cubics approach, which generates a sequence of cubic subproblems that have closed-form solutions with suitable regularization choices and investigate the performance of our proposed and compare our approach to state-of-the-art first-order and other quasi-Newton methods. To incorporate the benefits of an exponential moving average algorithm to a quasi-Newton approach, we propose a quasi-Adam approach. Judicious choices of quasi-Newton matrices can lead to guaranteed descent in the objective function and improved convergence. In this work, we integrate search directions obtained from using these quasi-Newton Hessian approximations with the Adam optimization algorithm. We provide convergence guarantees and demonstrate improved performance through an extensive experimentation on a variety of applications. Finally, to mitigate the issue of data corruption, we propose a variety of architectures for various applications in image processing. We propose a blind source signal separator, which involves separating image signals which have been superimposed by a common observing apparatus. We propose novel deep learning architectures for low photon count image denoising, which contains Gaussian noise in a low-photon count setting. Then we propose a novel architecture for lowphoton count and downsampled imaging, where the signal is interfered with some Gaussian noise, Poisson noise and then downsampled. Finally, we propose a novel adversarial detection method for white-box attacks using Radial basis function and Discrete Cosine Transforms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View