Constrained estimation via the fused lasso and some generalizations
MetadataShow full item record
This dissertation studies structurally constrained statistical estimators. Two entwined main themes are developed: computationally efficient algorithms, and strong statistical guarantees of estimators across a wide range of frameworks. In the first chapter we discuss a unified view of optimization problems that enforces constrains, such as smoothness, in statistical inference. This in turn helps to incorporate spatial and/or temporal information about data. The second chapter studies the fused lasso, a non-parametric regression estimator commonly used for graph denoising. This has been widely used in applications where the graph structure indicates that neighbor nodes have similar signal values. I prove for the fused lasso on arbitrary graphs, an upper bound on the mean squared error that depends on the total variation of the underlying signal on the graph. Moreover, I provide a surrogate estimator that can be found in linear time and attains the same upper–bound. In the third chapter I present an approach for penalized tensor decomposition (PTD) that estimates smoothly varying latent factors in multiway data. This generalizes existing work on sparse tensor decomposition and penalized matrix decomposition, in a manner parallel to the generalized lasso for regression and smoothing problems. I present an efficient coordinate-wise optimization algorithm for PTD, and characterize its convergence properties. The fourth chapter proposes histogram trend filtering, a novel approach for density estimation. This estimator arises from looking at surrogate Poisson model for counts of observations in a partition of the support of the data. The fifth chapter develops a class of estimators for deconvolution in mixture models based on a simple two-step bin-and-smooth procedure, applied to histogram counts. The method is both statistically and computationally efficient. By exploiting recent advances in convex optimization,we are able to provide a full deconvolution path that shows the estimate for the mixing distribution across a range of plausible degrees of smoothness, at far less cost than a full Bayesian analysis. Finally, the sixth chapter summarizes my contributions and provides possible directions for future work.