Knowledge integration in machine reading

dc.contributor.advisorPorter, Bruce, 1956-en
dc.contributor.committeeMemberAllen, James F.en
dc.contributor.committeeMemberBarker, Kenneth J.en
dc.contributor.committeeMemberLifschitz, Vladimiren
dc.contributor.committeeMemberMooney, Raymond J.en
dc.creatorKim, Doo Soonen
dc.date.accessioned2011-11-04T19:41:35Zen
dc.date.available2011-11-04T19:41:35Zen
dc.date.issued2011-08en
dc.date.submittedAugust 2011en
dc.date.updated2011-11-04T19:41:45Zen
dc.descriptiontexten
dc.description.abstractMachine reading is the artificial-intelligence task of automatically reading a corpus of texts and, from the contents, building a knowledge base that supports automated reasoning and question answering. Success at this task could fundamentally solve the knowledge acquisition bottleneck – the widely recognized problem that knowledge-based AI systems are difficult and expensive to build because of the difficulty of acquiring knowledge from authoritative sources and building useful knowledge bases. One challenge inherent in machine reading is knowledge integration – the task of correctly and coherently combining knowledge snippets extracted from texts. This dissertation shows that knowledge integration can be automated and that it can significantly improve the performance of machine reading. We specifically focus on two contributions of knowledge integration. The first contribution is for improving the coherence of learned knowledge bases to better support automated reasoning and question answering. Knowledge integration achieves this benefit by aligning knowledge snippets that contain overlapping content. The alignment is difficult because the snippets can use significantly different surface forms. In one common type of variation, two snippets might contain overlapping content that is expressed at different levels of granularity or detail. Our matcher can “see past” this difference to align knowledge snippets drawn from a single document, from multiple documents, or from a document and a background knowledge base. The second contribution is for improving text interpretation. Our approach is to delay ambiguity resolution to enable a machine-reading system to maintain multiple candidate interpretations. This is useful because typically, as the system reads through texts, evidence accumulates to help the knowledge integration system resolve ambiguities correctly. To avoid a combinatorial explosion in the number of candidate interpretations, we propose the packed representation to compactly encode all the candidates. Also, we present an algorithm that prunes interpretations from the packed representation as evidence accumulates. We evaluate our work by building and testing two prototype machine reading systems and measuring the quality of the knowledge bases they construct. The evaluation shows that our knowledge integration algorithms improve the cohesiveness of the knowledge bases, indicating their improved ability to support automated reasoning and question answering. The evaluation also shows that our approach to postponing ambiguity resolution improves the system’s accuracy at text interpretation.en
dc.description.departmentComputer Sciencesen
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2011-08-4049en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2011-08-4049en
dc.language.isoengen
dc.subjectMachine readingen
dc.subjectKnowledge integrationen
dc.subjectArtificial intelligenceen
dc.subjectText understandingen
dc.subjectNLPen
dc.titleKnowledge integration in machine readingen
dc.type.genrethesisen
thesis.degree.departmentComputer Sciencesen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KIM-DISSERTATION.pdf
Size:
2.1 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.11 KB
Format:
Plain Text
Description: