Matrix nearness problems in data mining
This thesis addresses some fundamental problems in data mining and machine learning that may be cast as matrix nearness problems. Some exam- ples of well-known nearness problems are: low-rank approximations, sparse approximations, clustering, co-clustering, kernel learning, and independent components analysis. In this thesis we study two types of matrix nearness problems. In the first type, we compute a low-rank matrix approximation to a given input matrix, thereby representing it more efficiently and hopefully discovering the latent structure within the input data. In the second kind of nearness problem we seek to either learn a parameterized model of/from the input data, or the data represents noisy measurements of some underly- ing objects and we wish to recover the original measurements. Both types of problems can be naturally approached by computing an output model/matrix that is “near” the input. The specific nearness problems that we study in this thesis include: i) nonnegative matrix approximation (NNMA), ii) incremental low-rank matrix approximations, iii) general low-rank matrix approximations via convex op- timization, iv) learning a parametric mixture model for data, specifically for directional data, and v) metric nearness. NNMA is a recent powerful matrix decomposition technique that ap- proximates a nonnegative input matrix by a low-rank approximation composed of nonnegative factors. It has found wide applicability across a broad spec- trum fields, ranging from problems in text analysis, image processing, and gene microarray analysis, to music transcription. We develop several new gen- eralizations to the NNMA problem and derive efficient iterative algorithms for computing the associated approximation. Furthermore, we also provide efficient software which implements many of the derived algorithms. With growing input matrix sizes, sometimes low-rank approximation techniques themselves can become computationally expensive. For such situa- tions, and to aid model selection (the rank of the approximation), we develop incremental versions of low-rank matrix approximations, where the approxi- mation is obtained one rank at a time. There are several applications of such a scheme, for example, topic discovery from a collection of documents. We also develop methods based on large-scale convex optimization for computing low-rank approximations to the input data. Our approach can deal with large scale data, while permitting incorporation of constraints more general than nonnegativity if desired. Our approach has some beneficial byproducts—it yields new methods for solving the nonnegative least squares problem, as well as l1-norm regression. The next nearness problem that we look at is that of learning a para- metric probabilistic mixture model for the data. Here one estimates a param- eter matrix given the input data, where the estimation process is implicitly regularized to avoid over-fitting. In particular we solve the parameter estimation problem for two fundamental high-dimensional directional distributions, namely the von Mises-Fisher and Watson distributions. Parameter estimation for these distributions is highly non-trivial and we present efficient methods for it. The final nearness problem that we study is a more typical matrix nearness problem, which is called metric nearness. The goal here is to find a distance matrix (i.e., a matrix whose entries satisfy the triangle inequality) that is “nearest” to an input matrix of dissimilarity values. For most of the algorithms that we develop in this thesis, we also pro- vide software that implements them. This software will be of use to both researchers and practitioners who want to experiment with our algorithms.