Medical text simplification and an evaluation of factuality
In this thesis we introduce the problem of paragraph simplification in the medical domain. We produce a dataset for this task and introduce a novel method of training sequence-to-sequence simplification models using the unlikelihood training framework. We demonstrate that this method produces simplifications that are easier to read. Motivated by the presence of factual errors in our paragraph simplification model's outputs, we develop an annotation scheme to identify these errors in datasets and model outputs for the easier sentence simplification task. Crowdsourced annotations using this scheme demonstrate that factual errors, especially the deletion of important information, are pervasive in the datasets and system outputs. Finally, we show that existing methods of measuring semantic similarity do not adequately capture factual errors in the sentence simplification domain, indicating that future work must be done to effectively identify and alleviate factual errors in text simplification.