Fine-grained evaluation for text summarization




Goyal, Tanya (Ph. D. in computer science)

Journal Title

Journal ISSN

Volume Title



As AI systems continue to improve, reliable evaluation is needed to measure progress. In text summarization literature, a good automatic evaluation metric surfaces the same rankings between different competing systems as a human annotator would. But, standard metrics rely on superficial surface cues and struggle with this. In this thesis, we describe work that builds reliable evaluation frameworks for the text summarization task, focusing on localized modeling approaches that provide actionable insights. Our work targets two important aspects of summary quality: factuality and coherence of generated text. For factuality, we propose the dependency arc entailment model that decomposes the overall error detection task into smaller entailment tasks that predict whether individual word relationships are entailed by the input. We show that this fine-grained approach to modeling factuality is more effective at detecting errors compared to standard summary-level approaches, while also providing interpretability benefits. We follow similar principles of fine-grained evaluation when designing our coherence evaluation framework. Here, we show that both human annotation and automatic modeling benefit from a fine-grained treatment of errors, especially when dealing with long text and complex narratives. Furthermore, we show that localization of errors is useful beyond evaluation. Because human annotation for text generation is expensive, we rely on automatically collected training datasets that can be noisy and encourage the very behaviors we want to avoid, e.g. factual errors. We show that the error localization can be used to disambiguate between good and noisy signals in existing data and learn to emphasize the former. Overall, we provide multiple options of training modifications that be used to train models with specific goals or properties.


LCSH Subject Headings