Using global low-rank kernel matrix approximations in machine learning and uncertainty quantification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Kernel methods are a broad class of algorithms that are applied in a host of scientific computing fields. In this thesis we focus on applying kernel methods to supervised learning in machine learning and uncertainty quantification of learning algorithms. Kernel methods offer an interpretable way to model nonlinear functions, but they are difficult to scale due to the computational challenges. As a result, kernels are underutilized as tools for solving supervised learning problems on large data and measuring uncertainty in learning algorithms. We first provide understanding and methods to effectively scale kernel methods for supervised learning and to then use those results to inform uncertainty quantification of machine learning algorithms in medical image segmentation. The primary challenge of scaling kernel methods is computing and applying the kernel matrix, so approximating this matrix is a crucial step for all kernel methods. The primary approximation method studied in this thesis is the Nystrom method, a popular, easy-to- implement method which uses randomized sampling to construct a low-rank factorization of the matrix. We implement a parallel version of Nystrom and use it to study properties of large kernel matrices and offer both theoretical and empirical comparisons of Nystrom and treecodes (a more sophisticated matrix approximation technique); these results determine the set of problems to which Nystrom is best suited. Applying an approximation to a kernel learning problem generally results in convex optimization problems, but only gradient-based methods are effectively scaled. We explore the efficacy of methods that use second derivative information by deriving a novel formulation and implementing a solver for a supervised learning classification method called kernel logistic regression (KLR). Our work combines Nystrom with an Inexact Newton solver to effectively scale to large datasets and outperform a state-of-the-art gradient solver. We apply this work to medical image segmentation, specifically the task of segmenting glioblastoma (GBM) in brain magnetic resonance imaging (MRI) scans. Segmenting GBM is an important task in tumor prognosis and research efforts, but segmenting an image is a manual, time-consuming process. As a result, researchers have developed sophisticated methods to automate the process; deep neural networks (DNNs) can now achieve results close to human accuracy. DNNs represent the composition of a highly nonlinear representation of the data through millions of parameters and logistic regression on the results for each pixel. However, DNNs do not offer informative uncertainty estimates, which limits the adoption of these methods. We address the lack of uncertainty by using the KLR method to replace the last layer in the DNN. This approximation offers significantly smoother probability maps and, while reducing the parameter space by orders of magnitude, does not severely weaken the performance of the algorithm. The main benefit of our approach is that it provides an easy-to-sample, approximate posterior distribution of the KLR weights. We investigate the structure of this posterior and demonstrate that the uncertainty estimates produced by sampling the posterior empirically approximate DNN errors well.