Learning with latent structures, robustness and non-linearity : non-convex approaches
MetadataShow full item record
Non-convex optimization based algorithms are ubiquitous in machine learning and statistical estimation, especially in dealing with complex models that are noisy, non-linear or contain latent structures. Popular techniques for non-convex problems such as alternating minimization and gradient descent, are typically easy to derive and use, but mostly lack theoretical foundations. It is difficult to obtain statistical consistency guarantees for these non-convex heuristics due to the existence of local minima in non-convex problems. From algorithmic perspective, it is even unclear whether average cases of some learning problems, which typically involve NP-hardness in the worst case, can be solved in polynomial time. This thesis seeks to tackle these challenges for a few statistical learning problems: mixed linear regression, robust principal component analysis, single index models in both standard and high-dimensional settings. We provide novel non-convex approaches for these problems via exploring theoretical understandings of several techniques favored in practice, including alternating minimization, EM, and non-convex gradient descent. We also present (optimal) statistical consistency guarantees and scalable computational complexities for the proposed algorithms. Our results suggest that in the problems we study, the aforementioned non-convex heuristics can quickly converge to global optimum (or statistically comparable solutions) when initialized within certain local regions around the underlying true parameters.