Machine learning via particle optimization
A variety of machine learning problems can be unifiedly viewed as optimizing a set of variables that are invariant to permutations (a.k.a. particles). This includes, for example, finding particle-based approximations of intractable distributions for uncertainty quantification and Bayesian inference, and learning mixture models or neural networks in which the mixture components or neurons in the same layers are permutable. In this dissertation, we take this unified particle optimization view and develop a variety of novel steepest descent algorithms for particle optimization, providing powerful tools for various challenging tasks, including salable and automatic approximate inference, learning diversified mixture models and energy-efficient neural architecture optimization.
Part I: By viewing sampling from distributions as optimization of particles, we develop a non-parametric, particle-based variational inference algorithm for approximate inference of intractable distributions. Our algorithm works by moving a set of particles iteratively to form increasingly better approximation of a given target distribution in the sense that KL divergence between the empirical particle distribution and the target distribution is minimized. Our new algorithms greatly increase the scalability of Bayesian inference for both big data and complex models.
Part II: By viewing mixture components as particles, we develop a novel and efficient algorithm for learning diversity-promoting mixture models. We leverage an entropy functional to encourage exploration in the parameter space. This diversity-promoting algorithm enables us to build accurate density models of complex data, yielding quantitatively improved explanatory representations of the data.
Part III: By viewing the neurons in deep neural networks as particles and investigating new mechanisms for adaptively introducing new particles by splitting existing particles, we develop a new architecture learning framework to progressively grow neural architectures. Our framework efficiently optimizes the loss function, allowing us to learn neural network architectures that are both accurate in prediction and efficient in computational and energy cost.