Semantic interpretation with distributional analysis
Unstructured text contains a wealth of knowledge, however, it is in a form unsuitable for reasoning. Semantic interpretation is the task of processing natural language text to create or extend a coherent, formal knowledgebase able to reason and support question answering. This task involves entity, event and relation extraction, co-reference resolution, and inference. Many domains, from intelligence data to bioinformatics, would benefit by semantic interpretation. But traditional approaches to the subtasks typically require a large annotated corpus specific to a single domain and ontology. This dissertation describes an approach to rapidly train a semantic interpreter using a set of seed annotations and a large, unlabeled corpus. Our approach adapts methods from paraphrase acquisition and automatic thesaurus construction to extend seed syntactic to semantic mappings using an automatically gathered, domain specific, parallel corpus. During interpretation, the system uses joint probabilistic inference to select the most probable interpretation consistent with the background knowledge. We evaluate both the quality of the extended mappings as well as the performance of the semantic interpreter.