New computational and statistical characterizations of neural network learning




Gollakota, Aravind

Journal Title

Journal ISSN

Volume Title



A foundational goal of machine learning theory is to characterize the inherent computational and statistical complexity of some of the most basic tasks in machine learning. In this thesis, we present new results concerning two such tasks in neural network learning and beyond. First, we study the question of when efficient algorithms can achieve high test accuracy on labeled data known to be consistent with a simple neural network. We present a set of results establishing the surprising computational intractability of this problem even in the benign setting where the inputs are drawn from a Gaussian, and the labels are perfectly consistent with a simple two-hidden-layer or even one-hidden-layer neural network. These hardness results illuminate what types of problem assumptions are necessary for efficient algorithms for this problem to be possible at all. Next, we investigate the problem of testing whether a learning algorithm has fit the data as well as its guarantee claims. This is a serious issue for agnostic supervised learning (i.e. supervised learning with no assumptions on the labels), where most efficient algorithms make simplifying distributional assumptions such as Gaussianity. But such assumptions can be hard to verify, meaning it can be hard to check whether the learner has actually succeeded. The recent elegant model of testable learning addresses this issue by replacing such hard-to-verify distributional assumptions with efficiently testable ones. We present both a broad algorithmic framework as well as a full statistical characterization of this model.


LCSH Subject Headings