Bayesian modeling with tractable inference and applications to online peer-to-peer lending

Date

2020-08-13

Authors

Zhang, Quan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

While modern machine learning and deep learning seem to dominate the areas where scalability and modeling flexibility are required, Bayesian methods shine out when people are seeking interpretation and high-quality uncertainty estimation. Appreciating the beauty of Bayesian statistics, I have been dedicated to tractable Bayesian inference and interpretable modeling, and especially interested in Markov chain Monte Carlo (MCMC) on which Bayesian inference has highly depended until variational inferences was invented to provide an alternative solution. Therefore, I develop novel algorithms in MCMC and variation inference and interpretable models to explain complex mechanisms. Proposed in the thesis are novel Bayesian inference algorithms and modeling framework to solve various fundamental problems in statistics, including general-purpose statistical inference with uncertainty quantification, multinomial classification and survival analysis. Tremendous help was given by my Ph.D. advisor, the committee members and other collaborators and a lot of inspirations were ignited amid cooperation.

In the first chapter of the thesis, we introduce MCMC-interactive variational inference that utilizes the complementary advantages of MCMC and variational inference. This inference algorithm not only accurately and efficiently estimates the posteriors, but also facilitates designs of stochastic gradient MCMC and transition kernels for Gibbs sampling. In the second chapter, we propose a permuted and augmented stick-breaking construction as sequential decision making to extend any binary classifier using a cross-entropy loss to a Bayesian multinomial one, so that favorable properties of the binary classifier can be preserved. We develop a data augmentation scheme and an efficient Metropolis-Hastings algorithm to transform the sequential decision making problem into several conditionally independent ones so that parallel computing can be used. In the third chapter, we propose Weibull delegate racing to explicitly model surviving under competing events and to interpret non-monotonic covariate effects by an intuitive two-phase racing mechanism. For inference, we develop a Gibbs-sampling-based MCMC algorithm with data augmentations along with a maximum a posteriori estimation algorithm for big data analysis. As an application, we analyze time to loan payoff and default on Prosper.com, demonstrating not only distinguished model performance, but also the value of standard and soft information on this peer-to-peer lending platform.

Description

LCSH Subject Headings

Citation