Discovering latent structures in syntax trees and mixed-type data

Date

2016-05-04

Authors

Sun, Liang, 1988-

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Gibbs sampling is a widely applied algorithm to estimate parameters in statistical models. This thesis uses Gibbs sampling to resolve practical problems, especially on natural language processing and mixed type data. It includes three independent studies. The first study includes a Bayesian model for learning latent annotations. The technique is capable of parsing sentences in a wide variety of languages, producing results that are on-par with or surpass previous approaches in accuracy, and shows promising potential for parsing low-resource languages. The second study presents a method to automatically complete annotations from partially-annotated sentence data, with the help of Gibbs sampling. The algorithm significantly reduces the time required to annotate sentences for natural language processing, without a significant drop in annotation accuracy. The last study proposes a novel factor model for uncovering latent factors and exploring covariation among multiple outcomes of mixed types, including binary, count, and continuous data. Gibbs sampling is used to estimate model parameters. The algorithm successfully discovers correlation structures of mixed-type data in both simulated and real-word data.

Description

LCSH Subject Headings

Citation