# Algorithms for recovery in the presence of outliers

## Access full-text files

## Date

## Authors

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

The problem of fitting a model to noisy data is fundamental to statistics and machine learning. In this thesis, we focus on the case where some fraction of the data consists of arbitrary outliers. Classical robust statistics had considered this problem before and focused on finding estimators that minimize the sample complexity until very recently. Even simple problems such as Gaussian mean estimation were thought to be potentially computationally hard. In 2016 [DKK⁺16a] showed that it is possible to do this in polynomial time. This sparked a flurry of work on computational robust statistics. This thesis contributes to this area by studying regression and parameter estimation problems in the presence of outliers. In the context of regression, this thesis explores three problems: 1. Univariate Polynomial Regression. We demonstrate the first polynomial-time algorithm to recover univariate polynomials from up to half the measurements being outliers. The algorithm is optimal in terms of sample complexity, runtime, and the fraction of corruptions tolerated. 2. High Dimensional Sparse Linear Regression. We demonstrate the first polynomial-time algorithm that achieves the three simultaneously -- the ability to estimate sparse, as well as dense signals; tolerates a large constant fraction of outliers (24%); and tolerates adversarial rather than distributional (e.g., Gaussian) dense noise. 3. List Decodable High Dimensional Linear Regression. We demonstrate the first polynomial-time algorithm to recover high dimensional linear signals when more than 50% of the measurements are corrupted. The algorithm outputs a constant-sized list of candidates, one of which is close to the true signal. In the context of parameter estimation, we show 1. Robust mean estimation and Robust Sparse PCA. We design the first-ever potentially practical algorithms to perform sparse mean estimation and sparse PCA in the presence of a constant fraction of outliers. 2. Robust clustering of Gaussian Mixtures. We design the first polynomial-time algorithm, which recovers a mixture of k Gaussians separated in total variation distance in the presence of constant fraction of outliers. Separation in total variation distance is qualitatively the weakest possible notion of separation that is required for recovery.