Learning better latent representations from semantic knowledge
MetadataShow full item record
Many modern efforts in Natural Language Processing involve the use of deep neural network models, where dense vector representations are learned for words and sentences, and they have been proven effective in many downstream tasks. However, it remains unknown whether these representations truly understand the meaning of language, due to their vulnerability against adversarial attacks and lack of generalization ability to unseen domains. In this thesis, we investigate the use of semantic knowledge to help learn better representations from neural models. We start with a certain type of semantic phenomenon, the implicit predicate-argument relations, and propose two neural models that draw on narrative event coherence and entity salience. We also introduce an argument cloze task for the automatic creation of synthetic data at scale from structural representations of events and entities. We demonstrate that when trained with large-scale synthetic data, both these models show good performance on a human-annotated dataset for nominal implicit arguments. We then focus on the integration of a broader range of semantic knowledge into neural models in a more latent manner. We find that by injecting coreference knowledge as auxiliary supervision for self-attention, a relatively small model sets the state-of-the-art on a word prediction task specifically designed to require long-distance reasoning. We further explore different ways of integrating semantic knowledge into large-scale pre-trained language models to make them more generalizable at out-of-domain question answering tasks, and show some preliminary results.