Word meaning in context as a paraphrase distribution : evidence, learning, and inference

dc.contributor.advisorErk, Katrinen
dc.contributor.committeeMemberBaldridge, Jasonen
dc.contributor.committeeMemberBannard, Colinen
dc.contributor.committeeMemberDhillon, Inderjiten
dc.contributor.committeeMemberMooney, Raymonden
dc.creatorMoon, Taesun, Ph. D.en
dc.date.accessioned2011-10-25T14:45:37Zen
dc.date.available2011-10-25T14:45:37Zen
dc.date.issued2011-08en
dc.date.submittedAugust 2011en
dc.date.updated2011-10-25T14:45:47Zen
dc.descriptiontexten
dc.description.abstractIn this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of this model is to provide a flexible platform that can be used to explore multiple hypotheses about usage meaning computation. Our model takes up and extends the proposals of Erk and Pado [2007] and McCarthy and Navigli [2009] by representing usage meaning as a probability distribution over potential paraphrases. We use undirected graphical models to infer this probability distribution for every content word in a given sentence. Graphical models represent complex probability distributions through a graph. In the graph, nodes stand for random variables, and edges stand for direct probabilistic interactions between them. The lack of edges between any two variables reflect independence assumptions. In our model, we represent each content word of the sentence through two adjacent nodes: the observed node represents the surface form of the word itself, and the hidden node represents its usage meaning. The distribution over values that we infer for the hidden node is a paraphrase distribution for the observed word. To encode the fact that lexical semantic information is exchanged between syntactic neighbors, the graph contains edges that mirror the dependency graph for the sentence. Further knowledge sources that influence the hidden nodes are represented through additional edges that, for example, connect to document topic. The integration of adjacent knowledge sources is accomplished in a standard way by multiplying factors and marginalizing over variables. Evaluating on a paraphrasing task, we find that our model outperforms the current state-of-the-art usage vector model [Thater et al., 2010] on all parts of speech except verbs, where the previous model wins by a small margin. But our main focus is not on the numbers but on the fact that our model is flexible enough to encode different hypotheses about usage meaning computation. In particular, we concentrate on five questions (with minor variants): - Nonlocal syntactic context: Existing usage vector models only use a word's direct syntactic neighbors for disambiguation or inferring some other meaning representation. Would it help to have contextual information instead "flow" along the entire dependency graph, each word's inferred meaning relying on the paraphrase distribution of its neighbors? - Influence of collocational information: In some cases, it is intuitively plausible to use the selectional preference of a neighboring word towards the target to determine its meaning in context. How does modeling selectional preferences into the model affect performance? - Non-syntactic bag-of-words context: To what extent can non-syntactic information in the form of bag-of-words context help in inferring meaning? - Effects of parametrization: We experiment with two transformations of MLE. One interpolates various MLEs and another transforms it by exponentiating pointwise mutual information. Which performs better? - Type of hidden nodes: Our model posits a tier of hidden nodes immediately adjacent the surface tier of observed words to capture dynamic usage meaning. We examine the model based on by varying the hidden nodes such that in one the nodes have actual words as values and in the other the nodes have nameless indexes as values. The former has the benefit of interpretability while the latter allows more standard parameter estimation. Portions of this dissertation are derived from joint work between the author and Katrin Erk [submitted].en
dc.description.departmentLinguisticsen
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2011-08-4143en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2011-08-4143en
dc.language.isoengen
dc.subjectComputational linguisticsen
dc.subjectLexical semanticsen
dc.subjectProbabilistic graphical modelsen
dc.subjectNatural language processingen
dc.subjectWord sense disambiguationen
dc.subjectParaphrasingen
dc.titleWord meaning in context as a paraphrase distribution : evidence, learning, and inferenceen
dc.type.genrethesisen
thesis.degree.departmentLinguisticsen
thesis.degree.disciplineLinguisticsen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MOON-DISSERTATION.pdf
Size:
958.08 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Plain Text
Description: