Structured low complexity data mining




Jo, Jason

Journal Title

Journal ISSN

Volume Title



Due to the rapidly increasing dimensionality of modern datasets many classical approximation algorithms have run into severe computational bottlenecks. This has often been referred to as the “curse of dimensionality.” To combat this, low complexity priors have been used as they enable us to design efficient approximation algorithms which are capable of scaling up to these modern datasets. Typically the reduction in computational complexity comes at the expense of accuracy. However, the tradeoffs have been relatively advantageous to the computational scientist. This is typically referred to as the “blessings of dimensionality.” Solving large underdetermined systems of linear equations has benefited greatly from the sparsity low complexity prior. A priori, solving a large underdetermined system of linear equations is severely ill-posed. However, using a relatively generic class of sampling matrices, assuming a sparsity prior can yield a well-posed linear system of equations. In particular, various greedy iterative approximation algorithms have been developed which can recover and accurately approximate the k-most significant atoms in our signal. For many engineering applications, the distribution of the top k atoms is not arbitrary and itself has some further structure. In the first half of the thesis we will be concerned with incorporating some a priori designed weights to allow for structured sparse approximation. We provide performance guarantees and numerically demonstrate how the appropriate use of weights can yield a simultaneous reduction in sample complexity and an improvement in approximation accuracy. In the second half of the thesis we will consider the collaborative filtering problem, specifically the task of matrix completion. The matrix completion problem is likewise severely ill-posed but with a low rank prior, the matrix completion problem with high probability admits a unique and robust solution via a cadre of convex optimization solvers. The drawback here is that the solvers enjoy strong theoretical guarantees only in the uniform sampling regime. Building upon recent work on non-uniform matrix completion, we propose a completely expert-free empirical procedure to design optimization parameters in the form of positive weights which allow for the recovery of arbitrarily sampled low rank matrices. We provide theoretical guarantees for these empirically learned weights and present numerical simulations which again show that encoding prior knowledge in the form of weights for optimization problems can again yield a simultaneous reduction in sample complexity and an improvement in approximation accuracy.




LCSH Subject Headings