Learning for semantic parsing and natural language generation using statistical machine translation techniques
MetadataShow full item record
One of the main goals of natural language processing (NLP) is to build au- tomated systems that can understand and generate human lanugages. This goal has so far remained elusive. Existing hand-crafted systems can provide in-depth anal- ysis of domain sub-languages, but are often notoriously fragile and costly to build. Existing machine-learned systems are considerably more robust, but are limited to relatively shallow NLP tasks. In this thesis, we present novel statistical methods for robust natural lan- guage understanding and generation. We focus on two important sub-tasks, seman- tic parsing and tactical generation. The key idea is that both tasks can be treated as the translation between natural languages and formal meaning representation lan- guages, and therefore, can be performed using state-of-the-art statistical machine translation techniques. Specifically, we use a technique called synchronous pars- ing, which has been extensively used in syntax-based machine translation, as the unifying framework for semantic parsing and tactical generation. The parsing and generation algorithms learn all of their linguistic knowledge from annotated cor- pora, and can handle natural-language sentences that are conceptually complex. A nice feature of our algorithms is that the semantic parsers and tactical gen- erators share the same learned synchronous grammars. Moreover, charts are used as the unifying language-processing architecture for efficient parsing and generation. Therefore, the generators are said to be the inverse of the parsers, an elegant prop- erty that has been widely advocated. Furthermore, we show that our parsers and generators can handle formal meaning representation languages containing logical variables, including predicate logic. Our basic semantic parsing algorithm is called WASP. Most of the other parsing and generation algorithms presented in this thesis are extensions of WASP or its inverse. We demonstrate the effectiveness of our parsing and generation al- gorithms by performing experiments in two real-world, restricted domains. Ex- perimental results show that our algorithms are more robust and accurate than the currently best systems that require similar supervision. Our work is also the first attempt to use the same automatically-learned grammar for both parsing and gen- eration. Unlike previous systems that require manually-constructed grammars and lexicons, our systems require much less knowledge engineering and can be easily ported to other languages and domains.