Learning language from ambiguous perceptual context

dc.contributor.advisorMooney, Raymond J. (Raymond Joseph)en
dc.contributor.committeeMemberBarzilay, Reginaen
dc.contributor.committeeMemberErk, Katrinen
dc.contributor.committeeMemberGrauman, Kristenen
dc.contributor.committeeMemberStone, Peteren
dc.creatorChen, David Lieh-Chiangen
dc.date.accessioned2012-07-05T19:16:55Zen
dc.date.available2012-07-05T19:16:55Zen
dc.date.issued2012-05en
dc.date.submittedMay 2012en
dc.date.updated2012-07-05T19:17:08Zen
dc.descriptiontexten
dc.description.abstractBuilding a computer system that can understand human languages has been one of the long-standing goals of artificial intelligence. Currently, most state-of-the-art natural language processing (NLP) systems use statistical machine learning methods to extract linguistic knowledge from large, annotated corpora. However, constructing such corpora can be expensive and time-consuming due to the expertise it requires to annotate such data. In this thesis, we explore alternative ways of learning which do not rely on direct human supervision. In particular, we draw our inspirations from the fact that humans are able to learn language through exposure to linguistic inputs in the context of a rich, relevant, perceptual environment. We first present a system that learned to sportscast for RoboCup simulation games by observing how humans commentate a game. Using the simple assumption that people generally talk about events that have just occurred, we pair each textual comment with a set of events that it could be referring to. By applying an EM-like algorithm, the system simultaneously learns a grounded language model and aligns each description to the corresponding event. The system does not use any prior language knowledge and was able to learn to sportscast in both English and Korean. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans. For the sportscasting task, while each comment could be aligned to one of several events, the level of ambiguity was low enough that we could enumerate all the possible alignments. However, it is not always possible to restrict the set of possible alignments to such limited numbers. Thus, we present another system that allows each sentence to be aligned to one of exponentially many connected subgraphs without explicitly enumerating them. The system first learns a lexicon and uses it to prune the nodes in the graph that are unrelated to the words in the sentence. By only observing how humans follow navigation instructions, the system was able to infer the corresponding hidden navigation plans and parse previously unseen instructions in new environments for both English and Chinese data. With the rise in popularity of crowdsourcing, we also present results on collecting additional training data using Amazon’s Mechanical Turk. Since our system only needs supervision in the form of language being used in relevant contexts, it is easy for virtually anyone to contribute to the training data.en
dc.description.departmentComputer Sciences
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2012-05-5203en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2012-05-5203en
dc.language.isoengen
dc.subjectNatural language processingen
dc.subjectNatural language learningen
dc.subjectConnecting language and perceptionen
dc.subjectMachine learningen
dc.subjectArtificial intelligenceen
dc.titleLearning language from ambiguous perceptual contexten
dc.type.genrethesisen
thesis.degree.departmentComputer Sciencesen
thesis.degree.disciplineComputer Sciencesen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHEN-DISSERTATION.pdf
Size:
2.82 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Plain Text
Description: