Knowledge integration in machine reading

Access full-text files




Kim, Doo Soon

Journal Title

Journal ISSN

Volume Title



Machine reading is the artificial-intelligence task of automatically reading a corpus of texts and, from the contents, building a knowledge base that supports automated reasoning and question answering. Success at this task could fundamentally solve the knowledge acquisition bottleneck – the widely recognized problem that knowledge-based AI systems are difficult and expensive to build because of the difficulty of acquiring knowledge from authoritative sources and building useful knowledge bases. One challenge inherent in machine reading is knowledge integration – the task of correctly and coherently combining knowledge snippets extracted from texts. This dissertation shows that knowledge integration can be automated and that it can significantly improve the performance of machine reading.

We specifically focus on two contributions of knowledge integration. The first contribution is for improving the coherence of learned knowledge bases to better support automated reasoning and question answering. Knowledge integration achieves this benefit by aligning knowledge snippets that contain overlapping content. The alignment is difficult because the snippets can use significantly different surface forms. In one common type of variation, two snippets might contain overlapping content that is expressed at different levels of granularity or detail. Our matcher can “see past” this difference to align knowledge snippets drawn from a single document, from multiple documents, or from a document and a background knowledge base.

The second contribution is for improving text interpretation. Our approach is to delay ambiguity resolution to enable a machine-reading system to maintain multiple candidate interpretations. This is useful because typically, as the system reads through texts, evidence accumulates to help the knowledge integration system resolve ambiguities correctly. To avoid a combinatorial explosion in the number of candidate interpretations, we propose the packed representation to compactly encode all the candidates. Also, we present an algorithm that prunes interpretations from the packed representation as evidence accumulates.

We evaluate our work by building and testing two prototype machine reading systems and measuring the quality of the knowledge bases they construct. The evaluation shows that our knowledge integration algorithms improve the cohesiveness of the knowledge bases, indicating their improved ability to support automated reasoning and question answering. The evaluation also shows that our approach to postponing ambiguity resolution improves the system’s accuracy at text interpretation.



LCSH Subject Headings