Browsing by Subject "Text generation"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Introducing controlled reasoning into autoregressive large language models(2023-05) Mersinias, Michail; Li, Junyi Jessy; Mahowald, KyleIn this thesis, we explore two ways in order to enhance and optimize the text generation process of autoregressive large language models (LLMs), in particular those with a generative pre-trained transformer (GPT) architecture which we categorize into GPT and InstructGPT model types. In both cases, our proposed methods attempt to replicate human cognitive behavior and introduce System 2 (controlled) reasoning into the text generation process. For GPT models, we explore incorporating natural language inference (NLI) into the text generative pipeline by using a pre-trained NLI model to assess whether a generated sentence entails, contradicts, or is neutral to the prompt and preceding text. First, we show that the NLI task is predictive of generation errors 6 made by GPT-3. We use these results to develop an NLI-informed generation procedure for GPT-J. Then, we evaluate these generations by obtaining human annotations on error types and overall quality. We demonstrate that an NLI strategy of maximizing the neutral class provides the highest quality of generated text, significantly better than the vanilla generations, regardless of nucleus sampling parameter value. For InstructGPT models, we propose constant interaction between two separate instances of the same model: the generator and the critic. We train the critic using Scarecrow, a framework for machine text evaluation which defines ten generation error types. We explore different training procedures and demonstrate that a critic trained with two examples for each error type, as well as chain-of-thought, is highly predictive of generation errors. The critic provides feedback regarding the location, the reason and the type of each detected error within the generated text. We conclude that using this feedback, the generator has the potential to correct its own errors and produce text of higher quality.Item Neural text summarization with fine-grained control(2022-04-29) Xu, Jiacheng; Durrett, Greg; Li, Junyi Jessy; Mooney, Raymond J.; Cho, KyunghyunRecently, neural network-based approaches have pushed the performance of both extractive and abstractive text summarization models to new heights. Despite these advances, the black-box nature of neural network based text summarization models makes them hard to interpret and control. We cannot force a model to include or exclude certain pieces of information, nor can we even guarantee that everything it includes is factual. Understanding the model's mechanism and designing methods to control its behavior are the important pieces missing in the current paradigm. In this dissertation, we aim to build summarization systems with fine-grained control. Fine-grained control allows us to understand, assess, and manipulate small pieces of the output of summarization models. By structuring or analyzing the generation of summaries as these small pieces, we can explore paraphrase possibilities or precisely correct factual errors. More specifically, we want (a) fine granularity of spans that can be selected or removed in extractive and abstractive systems; (b) a deep understanding of how the model works so we can make the generation process more transparent and interpretable, to further guide the model better; (c) a more flexible and powerful decoding scheme improving the diversity and quality of generated texts. For extractive systems, we build compressive summarization models to remove undesired spans in the sentences selected by the extractive model. For abstractive systems, we start with a descriptive analysis of the model's generation by measuring and comparing the entropies of different generation steps. Then, we propose a two-stage framework to fully interpret the step-wise prediction decisions of neural abstractive summarization models. We conduct a comprehensive evaluation on commonly used attribution methods to assess their ability to locate and attribute content from the input. Finally, we present a search algorithm to construct lattices encoding a massive number of generation options for text summarization and machine translation. Two key components, modified best-first search and hypothesis recombination, are developed to fulfill the goal. Our approach with the introduced lattice structure encodes more high-quality candidates than baselines methods, with significantly higher overlap with annotated reference generations. Having these models, tools, and algorithms, we systematically gain more control over smaller segments and interpretable units of the generation process. This fine-grained analysis and construction of text generation systems can enable developers and users to select from, filter, calibrate, and examine models' output, empowering a range of possible applications.