Constrained relative entropy minimization with applications to multitask learning

Koyejo, Oluwasanmi Oluseye

Constrained relative entropy minimization with applications to multitask learning

Access full-text files

KOYEJO-DISSERTATION-2013.pdf (1.18 MB)

Date

2013-05

Authors

Koyejo, Oluwasanmi Oluseye

Abstract

This dissertation addresses probabilistic inference via relative entropy minimization subject to expectation constraints. A canonical representation of the solution is determined without the requirement for convexity of the constraint set, and is given by members of an exponential family. The use of conjugate priors for relative entropy minimization is proposed, and a class of conjugate prior distributions is introduced. An alternative representation of the solution is provided as members of the prior family when the prior distribution is conjugate. It is shown that the solutions can be found by direct optimization with respect to members of such parametric families. Constrained Bayesian inference is recovered as a special case with a specific choice of constraints induced by observed data.

The framework is applied to the development of novel probabilistic models for multitask learning subject to constraints determined by domain expertise. First, a model is developed for multitask learning that jointly learns a low rank weight matrix and the prior covariance structure between different tasks. The multitask learning approach is extended to a class of nonparametric statistical models for transposable data, incorporating side information such as graphs that describe inter-row and inter-column similarity. The resulting model combines a matrix-variate Gaussian process prior with inference subject to nuclear norm expectation constraints. In addition, a novel nonparametric model is proposed for multitask bipartite ranking. The proposed model combines a hierarchical matrix-variate Gaussian process prior with inference subject to ordering constraints and nuclear norm constraints, and is applied to disease gene prioritization. In many of these applications, the solution is found to be unique. Experimental results show substantial performance improvements as compared to strong baseline models.