Show simple item record

dc.contributor.advisorWilliamson, Sineaden
dc.creatorSchaefer, Kayla Hopeen
dc.date.accessioned2015-11-16T18:47:17Zen
dc.date.available2015-11-16T18:47:17Zen
dc.date.issued2015-05en
dc.date.submittedMay 2015en
dc.identifierdoi:10.15781/T2N334en
dc.identifier.urihttp://hdl.handle.net/2152/32498en
dc.descriptiontexten
dc.description.abstractSince its introduction, topic modeling has been a fundamental tool in analyzing corpus structures. While the Relational Topic Model provides a way to link, and subsequently cluster, documents together as an extension of the original Latent Dirichlet Allocation (LDA) model, this paper seeks to form a document clustering model for the nonparametric alternative to LDA, the Dirichlet Process. As the structure of Shakespeare's tragedies is the focus of this work, we specifically cluster documents while modeling the text using a Hierarchical Dirichlet Process (HDP), which allows for a mixture model with shared mixture components, in order to capture the natural topic clustering within a play. Using collapsed Gibbs sampling, the effectiveness of the clustered HDP is compared against that of LDA and an HDP without document clustering. This is done using both log perplexity and a qualitative assessment of the returned topics. Furthermore, clustering is performed and analyzed individually on speeches from each of ten tragedies, as well as with a combined corpus of acts.en
dc.format.mimetypeapplication/pdfen
dc.subjectClusteringen
dc.subjectNonparametric Bayesian statisticsen
dc.subjectHierarchical modelsen
dc.subjectGibbs samplingen
dc.subjectShakespeareen
dc.titleDocument clustering with nonparametric hierarchical topic modelingen
dc.typeThesisen
dc.date.updated2015-11-16T18:47:17Zen
dc.contributor.committeeMemberZhou, Mingyuanen
dc.description.departmentStatisticsen
thesis.degree.departmentStatisticsen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorThe University of Texas at Austinen
thesis.degree.levelMastersen
thesis.degree.nameMaster of Science in Statisticsen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record