Graph analytics and subset selection problems in machine learning
In this dissertation we examine two topics relevant to modern machine learning research: 1) Subgraph counting and 2) High-dimensional subset selection. The former can be used to construct features for performing graph analytics. The latter has applications in sparse modeling such as feature selection, sparse regression, and interpretable machine learning. Since these problems become intractable for large datasets, we design efficient approximation algorithms for both tasks with data-dependent performance guarantees.
In the first part of the dissertation, we study the problem of approximating all three and four node induced subgraphs in a large graph. These counts are called the 3 and 4-profile, respectively, and describe a graph's connectivity properties. This problem generalizes graphlet counting and has found applications ranging from bioinformatics to spam detection. Our algorithms use the novel concept of graph profile sparsifiers: sparse graphs that can be used to approximate the full profile counts for a given large graph. We obtain novel concentration results showing that graphs can be substantially sparsified and still retain good approximation quality for the global graph profile. We also study the problem of counting local and ego profiles centered at each vertex of the graph. These quantities embed every vertex into a low-dimensional space that characterizes the local geometry of its neighborhood. We introduce the concept of edge pivots and show that all local 3 and 4-profiles can be computed as vertex programs using compressed two-hop information. Our algorithms are local, distributed message-passing schemes and compute all graph profiles in parallel. We empirically evaluate these algorithms with a distributed GraphLab implementation, and show improvements over previous state-of-the-art in experiments scaling up to 640 cores on Amazon EC2.
In the second part we shift to the problem of subset selection: for example, selecting a few features from a large feature set. Motivated by the need for interpretable, nonlinear regression models for high-dimensional data, we draw a novel connection between this and submodular maximization. We extend an earlier concept of weak submodularity from the setting of sparse linear regression to a broad class of objective functions, including generalized linear model likelihoods. We then show that three greedy algorithms (Oblivous, Forward Stepwise, and Orthogonal Matching Pursuit) perform within a constant factor from the best possible subset. Our methods do not require any statistical modeling assumptions and allow direct control over the number of obtained features. This contrasts with other work that uses regularization parameters to control sparsity only implicitly. Our proof technique connecting convex analysis and submodular set function theory may be of independent interest for other statistical learning applications that have combinatorial structure.
In the third part, we consider the problem of explaining the predictions of a given black-box classifier. For example, why does a deep neural network assign an image to a particular class? We cast interpretability of black-box classifiers as a subset selection problem and propose to solve it with an efficient streaming algorithm. We provide a constant factor approximation guarantee for this algorithm in the case of a random stream order and a weakly submodular objective function. This is the first such theoretical guarantee for this general class of functions, and we also show that no such algorithm exists for a worst case stream order. Our algorithm obtains similar explanations of Inception V3 predictions 10 times faster than the state-of-the-art LIME framework.