Towards provably efficient algorithms for learning neural networks

Date

2020-07-06

Authors

Goel, Surbhi

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Neural networks (NNs) have seen a surge in popularity due to their unprecedented practical success in fields such as computer vision, robotics, and natural language. Developing provably efficient algorithms for learning commonly used neural network architectures continues to be a core challenge in understanding deep learning. In particular, even the problem of learning very basic architectures remains open. The underlying difficulty arises from the highly non-convex nature of the optimization problems posed by neural networks.

Despite their practical success, the standard neural network training algorithms based on gradient descent (GD) and its variants have almost no provable guarantees. This necessitates a paradigm shift towards developing new principled algorithms with provable guarantees. In this thesis, we give the first set of efficient algorithms for learning commonly studied neural network architectures under minimal assumptions. In the first part of the thesis, we focus on characterizing the computational complexity of learning a single non-linear unit. We combine techniques from kernel methods and polynomial approximation to give the first dimension-efficient algorithm for learning a single ReLU (rectified linear unit), the most popular activation function, in the agnostic learning model (arbitrary noise) for any distribution on the unit sphere. We further show that if the input distribution is assumed to be Gaussian, the problem is hard. Our results unconditionally imply that GD cannot agnostically learn a single ReLU. Lastly, we show that if we relax our learning guarantee, then there is a fully polynomial time algorithm that achieves a constant factor approximation for all isotropic log-concave distributions.

We further extend our results to shallow NNs. We give the first dimension efficient algorithm for learning norm-bounded one layer fully connected NNs. We subsequently show that if the marginal distribution on the input exhibits sufficient eigenvalue decay (low dimensional structure), then one-hidden-layer NNs can be learned in polynomial time in all parameters. For one hidden layer convolutional NNs, we propose a simple iterative algorithm that efficiently recovers the underlying parameters for commonly used convolutional schemes from computer vision. We further give the first polynomial time algorithm for networks with more that one hidden layer in a weaker noise model. The techniques from this work also give improved results for problems related to boolean concept learning.

Lastly, we shift focus to the unsupervised learning setting through the lens of graphical models. We study Restricted Boltzmann Machines (RBMs) which are simple generative neural networks that model a probability distribution. We give the first algorithm for learning RBMs with non-negative interactions under arbitrary biases on binary as well as non-binary input.

Description

LCSH Subject Headings

Citation