Word meaning in context as a paraphrase distribution : evidence, learning, and inference

Moon, Taesun, Ph. D.

Word meaning in context as a paraphrase distribution : evidence, learning, and inference

dc.contributor.advisor	Erk, Katrin	en
dc.contributor.committeeMember	Baldridge, Jason	en
dc.contributor.committeeMember	Bannard, Colin	en
dc.contributor.committeeMember	Dhillon, Inderjit	en
dc.contributor.committeeMember	Mooney, Raymond	en
dc.creator	Moon, Taesun, Ph. D.	en
dc.date.accessioned	2011-10-25T14:45:37Z	en
dc.date.available	2011-10-25T14:45:37Z	en
dc.date.issued	2011-08	en
dc.date.submitted	August 2011	en
dc.date.updated	2011-10-25T14:45:47Z	en
dc.description	text	en
dc.description.abstract	In this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of this model is to provide a flexible platform that can be used to explore multiple hypotheses about usage meaning computation. Our model takes up and extends the proposals of Erk and Pado [2007] and McCarthy and Navigli [2009] by representing usage meaning as a probability distribution over potential paraphrases. We use undirected graphical models to infer this probability distribution for every content word in a given sentence. Graphical models represent complex probability distributions through a graph. In the graph, nodes stand for random variables, and edges stand for direct probabilistic interactions between them. The lack of edges between any two variables reflect independence assumptions. In our model, we represent each content word of the sentence through two adjacent nodes: the observed node represents the surface form of the word itself, and the hidden node represents its usage meaning. The distribution over values that we infer for the hidden node is a paraphrase distribution for the observed word. To encode the fact that lexical semantic information is exchanged between syntactic neighbors, the graph contains edges that mirror the dependency graph for the sentence. Further knowledge sources that influence the hidden nodes are represented through additional edges that, for example, connect to document topic. The integration of adjacent knowledge sources is accomplished in a standard way by multiplying factors and marginalizing over variables. Evaluating on a paraphrasing task, we find that our model outperforms the current state-of-the-art usage vector model [Thater et al., 2010] on all parts of speech except verbs, where the previous model wins by a small margin. But our main focus is not on the numbers but on the fact that our model is flexible enough to encode different hypotheses about usage meaning computation. In particular, we concentrate on five questions (with minor variants): - Nonlocal syntactic context: Existing usage vector models only use a word's direct syntactic neighbors for disambiguation or inferring some other meaning representation. Would it help to have contextual information instead "flow" along the entire dependency graph, each word's inferred meaning relying on the paraphrase distribution of its neighbors? - Influence of collocational information: In some cases, it is intuitively plausible to use the selectional preference of a neighboring word towards the target to determine its meaning in context. How does modeling selectional preferences into the model affect performance? - Non-syntactic bag-of-words context: To what extent can non-syntactic information in the form of bag-of-words context help in inferring meaning? - Effects of parametrization: We experiment with two transformations of MLE. One interpolates various MLEs and another transforms it by exponentiating pointwise mutual information. Which performs better? - Type of hidden nodes: Our model posits a tier of hidden nodes immediately adjacent the surface tier of observed words to capture dynamic usage meaning. We examine the model based on by varying the hidden nodes such that in one the nodes have actual words as values and in the other the nodes have nameless indexes as values. The former has the benefit of interpretability while the latter allows more standard parameter estimation. Portions of this dissertation are derived from joint work between the author and Katrin Erk [submitted].	en
dc.description.department	Linguistics	en
dc.format.mimetype	application/pdf	en
dc.identifier.slug	2152/ETD-UT-2011-08-4143	en
dc.identifier.uri	http://hdl.handle.net/2152/ETD-UT-2011-08-4143	en
dc.language.iso	eng	en
dc.subject	Computational linguistics	en
dc.subject	Lexical semantics	en
dc.subject	Probabilistic graphical models	en
dc.subject	Natural language processing	en
dc.subject	Word sense disambiguation	en
dc.subject	Paraphrasing	en
dc.title	Word meaning in context as a paraphrase distribution : evidence, learning, and inference	en
dc.type.genre	thesis	en
thesis.degree.department	Linguistics	en
thesis.degree.discipline	Linguistics	en
thesis.degree.grantor	University of Texas at Austin	en
thesis.degree.level	Doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Access full-text files

Original bundle

Now showing 1 - 1 of 1

Name:: MOON-DISSERTATION.pdf
Size:: 958.08 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.12 KB
Format:: Plain Text
Description:

Download

Collections

UT Electronic Theses and Dissertations