Mining structured matrices in high dimensions
Structured matrices refer to matrix valued data that are embedded in an inherent lower dimensional manifold with smaller degrees of freedom compared to the ambient or observed dimensions. Such hidden (or latent) structures allow for statistically consistent estimation in high dimensional settings, wherein the number of observations is much smaller than the number of parameters to be estimated. This dissertation makes significant contributions to statistical models, algorithms, and applications of structured matrix estimation in high dimensional settings. The proposed estimators and algorithms are motivated by and evaluated on applications in e--commerce, healthcare, and neuroscience. In the first line of contributions, substantial generalizations of existing results are derived for a widely studied problem of matrix completion. Tractable estimators with strong statistical guarantees are developed for matrix completion under (a) generalized observation models subsuming heterogeneous data--types, such as count, binary, etc., and heterogeneous noise models beyond additive Gaussian, (b) general structural constraints beyond low rank assumptions, and (c) collective estimation from multiple sources of data. The second line of contributions focuses on the algorithmic and application specific ideas for generalized structured matrix estimation. Two specific applications of structured matrix estimation are discussed: (a) a constrained latent factor estimation framework that extends the ideas and techniques hitherto discussed, and applies them for the task of learning clinically relevant phenotypes from Electronic Health Records (EHRs), and (b) a novel, efficient, and highly generalized algorithm for collaborative learning to rank (LETOR) applications.