Browsing by Subject "Natural language processing"
Now showing 1 - 20 of 44
- Results Per Page
- Sort Options
Item Addressing the brittleness of knowledge-based question-answering(2009-12) Chaw, Shaw Yi; Porter, Bruce, 1956-; Barker, Kenneth J.; Mooney, Raymond; Novak, Gordon S.; Markman, ArtKnowledge base systems are brittle when the users of the knowledge base are unfamiliar with its content and structure. Querying a knowledge base requires users to state their questions in precise and complete formal representations that relate the facts in the question with relevant terms and relations in the underlying knowledge base. This requirement places a heavy burden on the users to become deeply familiar with the contents of the knowledge base and prevents novice users to effectively using the knowledge base for problem solving. As a result, the utility of knowledge base systems is often restricted to the developers themselves. The goal of this work is to help users, who may possess little domain expertise, to use unfamiliar knowledge bases for problem solving. Our thesis is that the difficulty in using unfamiliar knowledge bases can be addressed by an approach that funnels natural questions, expressed in English, into formal representations appropriate for automated reasoning. The approach uses a simplified English controlled language, a domain-neutral ontology, a set of mechanisms to handle a handful of well known question types, and a software component, called the Question Mediator, to identify relevant information in the knowledge base for problem solving. With our approach, a knowledge base user can use a variety of unfamiliar knowledge bases by posing their questions with simplified English to retrieve relevant information in the knowledge base for problem solving. We studied the thesis in the context of a system called ASKME. We evaluated ASKME on the task of answering exam questions for college level biology, chemistry, and physics. The evaluation consists of successive experiments to test if ASKME can help novice users employ unfamiliar knowledge bases for problem solving. The initial experiment measures ASKME's level of performance under ideal conditions, where the knowledge base is built and used by the same knowledge engineers. Subsequent experiments measure ASKME's level of performance under increasingly realistic conditions. In the final experiment, we measure ASKME's level of performance under conditions where the knowledge base is independently built by subject matter experts and the users of the knowledge base are a group of novices who are unfamiliar with the knowledge base. Results from the evaluation show that ASKME works well on different knowledge bases and answers a broad range of questions that were posed by novice users in a variety of domains.Item Advances in statistical script learning(2017-08-08) Pichotta, Karl; Mooney, Raymond J. (Raymond Joseph); Chambers, Nathanael; Erk, Katrin; Stone, PeterWhen humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence ``Mrs. Dalloway said she would buy the flowers herself,'' one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on. Observing this, it is clear that many different useful natural language end-tasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highly-probable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models. To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling multiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents. Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model. Third, we investigate the performance of a sequence-to-sequence encoder-decoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis. Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.Item Analyzing group behavior from language use with natural language processing and experimental methods : three applications in political science and sociology(2018-10-10) Brown, Christopher Henry; Beaver, David I., 1966-; Bannard, Colin; Beavers, John T; Erk, Katrin; Lease, MattThis dissertation presents three independent research projects with the common goal of analyzing and understanding group behavior from naturally occurring text, applying Natural Language Processing (NLP) and experimental methods to the domains of political science, sociology, and cognitive science. The first project develops a case study examining a grassroots initiative to bring an Ohio anti-labor bill to state-wide referendum. Social media platforms like Twitter present new opportunities for researchers to listen in on natural conversations, but this data is unstructured and too large for qualitative or manual analysis. I demonstrate the use of NLP and Machine Learning tools to identify opinions and extract trends from this text data, while addressing pitfalls and biases of these methods. The second project describes issues with measuring impact and influence using traditional citation analysis, and demonstrates how incorporating full text data improves citation network models. Citation analysis arose to address the need to quantify, filter, and rank scientific publications as they outgrew any single researcher’s ability to comprehensively survey all literature relevant to their research. The problem is that most citation metrics are based solely on network metadata: they operate under the assumption that every citation connotes the same amount of influence as any other, completely ignoring text content. I investigate textual features and comparison metrics indicative of citation relationships, and use my citation prediction system to demonstrate that even simple methods can improve citation models beyond the typical binary cited-or-not network. Finally, the third project examines how individuals’ beliefs change upon receiving new information. Multiple factors affect this behavior, like reliability of the source, and believability or coherence of information, but there is no one-size-fits-all model describing how people are influenced by new information. I present a novel experimental design to measure belief and confidence change, and show that increased reliability of new information boosts confidence, and that higher confidence decreases likelihood of changing one’s beliefs. The results also suggest some counter-intuitive behaviors: reliability has no discernible effect on willingness to change one’s belief, disagreement is more influential than agreement, and prior confidence has a non-linear effect on how new information changes confidence.Item Building effective representations for domain adaptation in coreference resolution(2018-05-04) Lestari, Victoria Anugrah; Durrett, GregOver the past few years, research in coreference resolution, one of the core tasks in Natural Language processing, has displayed significant improvement. However, the field of domain adaptation in coreference resolution is yet to be explored; Moosavi and Strube [2017] have shown that the performance of state-of-the-art coreference resolution systems drop when the systems are tested on datasets from different domains. We modify e2e-coref [Lee et al., 2017], a state-of-the-art coreference resolution system, to perform well on new domains by adding sparse linguistic features, incorporating information from Wikipedia, and implementing a domain adversarial network to the system. Our experiments show that each modification improves the precision of the system. We train the model on CoNLL-2012 datasets and test it on several datasets: WikiCoref, the pt documents, and the wb documents from CoNLL-2012. Our best results gains 0.50, 0.52, and 1.14 F1 improvements over the baselines of the respective test sets.Item Changing group dynamics through computerized language feedback(2012-08) Tausczik, Yla Rebecca; Pennebaker, James W.; Cormack, Lawrence K.; Gosling, Samuel D.; Graesser, Arthur C.; Henderson, Marlone D.Why do some groups of people work well together while others do not? It is commonly accepted that effective groups communicate well. Yet one of the biggest roadblocks facing the study of group communication is that it is extremely difficult to capture real-world group interactions and analyze the words people use in a timely manner. This project overcame this limitation in two ways. First, a broader and more systematic study of group processes was conducted by using a computerized text analysis program (Linguistic Inquiry and Word Count) that automatically codes natural language using pre-established rules. Groups that work well together typically exchange more knowledge and establish good social relationships, which is reflected in the way that they use words. The group dynamics of over 500 student discussion groups interacting via group chat were assessed by studying their language use. Second, a language feedback system was built to experimentally test the importance of certain group processes on group satisfaction and performance. It is now possible to provide language feedback by processing natural language dialogue using computerized text analysis in real time. The language feedback system can change the way the group works by providing individualized recommendations. In this way it is possible to manipulate group processes naturalistically. Together these studies provided evidence that important group processes can be detected even using simplistic natural language processing, and preliminary evidence that providing real-time feedback based on the words students use in a group discussion can improve learning by changing how the group works together.Item Characterizing content addition and explanation generation in document-level text simplification(2020-08-17) Srikanth, Neha Pundlik; Li, Junyi Jessy; Durrett, GregText simplification has remained an important task in computational linguistics for many years. Much of text simplification research focuses on modeling sentence simplification, addressing operations such as deletion, reordering, sentence splitting, and substitution, while research advancements in document-level simplification have been fairly limited. This work introduces a new phenomenon in document-level simplification called elaborative simplification, involving the insertion of content to make simplified texts easier to understand. We analyze the nature of elaborative simplification using a new corpus we collect, and illustrate its wide spectrum of contextual specificity, ranging from simple definitions to multi-step reasoning. We introduce two new modeling tasks - contextual specificity prediction and elaboration generation, and explore the capability of large scale pre-trained language models to generate a range of contextually specific elaborationsItem A computational model of language pathology in schizophrenia(2010-12) Grasemann, Hans Ulrich; Miikkulainen, Risto; Hoffman, Ralph E.; Mooney, Raymond J.; Love, Bradley C.; Ballard, Dana H.; Kuipers, Benjamin J.No current laboratory test can reliably identify patients with schizophrenia. Instead, key symptoms are observed via language, including derailment, where patients cannot follow a coherent storyline, and delusions, where false beliefs are repeated as fact. Brain processes underlying these and other symptoms remain unclear, and characterizing them would greatly enhance our understanding of schizophrenia. In this situation, computational models can be valuable tools to formulate testable hypotheses and to complement clinical research. This dissertation aims to capture the link between biology and schizophrenic symptoms using DISCERN, a connectionist model of human story processing. Competing illness mechanisms proposed to underlie schizophrenia are simulated in DISCERN, and are evaluated at the level of narrative language, the same level used to diagnose patients. The result is the first simulation of a speaker with schizophrenia. Of all illness models, hyperlearning, a model of overly intense memory consolidation, produced the best fit to patient data, as well as compelling models of delusions and derailments. If validated experimentally, the hyperlearning hypothesis could advance the current understanding of schizophrenia, and provide a platform for simulating the effects of future treatments.Item Computational modeling of politeness across diverse languages(2023-04-26) Srinivasan, Anirudh; Choi, EunsolWe study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels -- they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy's impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents. The data and code is publicly available at on GitHub https://github.com/Genius1237/TyDiP and HuggingFace https://huggingface.co/datasets/Genius1237/TyDiP.Item Continually improving grounded natural language understanding through human-robot dialog(2018-04-23) Thomason, Jesse David; Mooney, Raymond J. (Raymond Joseph); Stone, Peter; Niekum, Scott; Tellex, StefanieAs robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Item Creating a low resource semantic parser for the unified meaning representation format(2021-04-30) Wanna, Selma Liliane; Landsberger, Sheldon; Pryor, Mitchell WayneThis thesis investigates the performance of state-of-the-art neural models on a low resource semantic parsing task. This task required the models to convert natural language commands directed at a robot into Unified Meaning Representation Format (UMRF) structures. UMRF structures are standalone Meaning Representation (MR) containers that support embedding predicate-argument semantics and graphical MR formats. The structure was design for semi-autonomous systems in Human Robot Interaction (HRI) domains. The UMRF formalism is both new and novel, thus there is a scarcity of annotated UMRF data and thus a lack of available training data. For this project, the Examine in light task from the ALFRED dataset was selected as the corpora to annotate labeled UMRF training and validation examples. 1,010 and 100 training and validation datasets were collected respectively. Thereafter, the following models were tested on the low resource semantic parsing task: sequence-to-sequence, CopyNet, and transformer architectures. Of the three designs, the CopyNet model performed the best with a BLEU score of 0.891 and an accuracy of 61.3%. Once the design was finalized, the CopyNet model was integrated into a ROS2 software package, allowing the larger robotics community to access the semantic parser.Item Data driven strategies for product design(2022-01-19) Karanam, Subrahmanyam Aditya; Barua, Anitesh; Agarwal, Ashish; Saar-Tsechansky, Maytal; Sonnier, GarrettOnline platforms have revolutionized the development and distribution of products. For instance, digital platforms have incentivized product developers to supply large number of products by providing appropriate infrastructure and APIs, which lead to the creation of long tail. However, product developers face significant challenge in generating demand. Further, online platforms have eased up the interaction between producers and consumers in the form of product reviews. Consumer suggestions expressed in these reviews can help in improving the product design. Moreover, these online interactions of producers and consumers have generated large volumes of unstructured data which can be highly valuable for obtaining consumer insights. This research work addresses each of these issues. In the first essay, we investigate an appropriate design strategy to make a product publicly visible given its position in the demand distribution. Our results suggest that social features help increase the demand for tail apps and are also useful for head apps in informing users about new intrinsic features. In the second essay, we examine the demand impact of firm and user product development ideas. We find that while developer initiated innovative and user suggested imitative features help increase the demand, the impact of user suggested innovative features is negative. In the third essay, we develop a custom deep learning model to extract user suggested features from their product reviews. These works not only extend the literature in several streams of research, notably product design, long tail, innovation, imitation, and user suggestions, but also provides actionable insights for product developers.Item Dialog as a vehicle for lifelong learning of grounded language understanding systems(2020-09-14) Padmakumar, Aishwarya; Mooney, Raymond J., (Raymond Joseph); Stone, Peter; Niekum, Scott; Chai, JoyceNatural language interfaces have the potential to make various forms of technology, including mobile phones and computers as well as robots or other machines such as ATMs and self-checkout counters, more accessible and less intimidating to users who are unfamiliar or uncomfortable with other types of interfaces. In particular, natural language understanding systems on physical robots face a number of challenges, including the need to ground language in perception, the ability to adapt to changes in the environment and novel uses of language, and to deal with uncertainty in understanding. To effectively handle these challenges, such systems need to perform lifelong learning - continually updating the scope and predictions of the model with user interactions. In this thesis, we discuss ways in which dialog interaction with users can be used to improve grounded natural language understanding systems, motivated by service robot applications. We focus on two types of queries that can be used in such dialog systems – active learning queries to elicit knowledge about the environment that can be used to improve perceptual models, and clarification questions that confirm the system’s hypotheses, or elicit specific information required to complete a task. Our goal is to build a system that can learn how to interact with users balancing a quick completion of tasks desired by the user with asking additional active learning questions viiito improve the underlying grounded language understanding components. We present work on jointly improving semantic parsers from and learning a dialog policy for clarification dialogs, that improve a robot’s ability to understand natural language commands. We introduce the framework of opportunistic active learning, where a robot introduces opportunistic queries, that may not be immediately relevant, into an interaction in the hope of improving performance in future interactions. We demonstrate the usefulness of this framework in learning to ground natural language descriptions of objects, and learn a dialog policy for such interactions. We also learn dialog policies that balance task completion, opportunistic active learning, and attribute-based clarification questions. Finally, we attempt to expand this framework to different types of underlying models of grounded language understanding.Item Discovering latent structures in syntax trees and mixed-type data(2016-05-04) Sun, Liang, 1988-; Dimitrov, Nedialko B.; Baldridge, Jason; Hasenbein, John; Khajavirad, Aida; Scott, JamesGibbs sampling is a widely applied algorithm to estimate parameters in statistical models. This thesis uses Gibbs sampling to resolve practical problems, especially on natural language processing and mixed type data. It includes three independent studies. The first study includes a Bayesian model for learning latent annotations. The technique is capable of parsing sentences in a wide variety of languages, producing results that are on-par with or surpass previous approaches in accuracy, and shows promising potential for parsing low-resource languages. The second study presents a method to automatically complete annotations from partially-annotated sentence data, with the help of Gibbs sampling. The algorithm significantly reduces the time required to annotate sentences for natural language processing, without a significant drop in annotation accuracy. The last study proposes a novel factor model for uncovering latent factors and exploring covariation among multiple outcomes of mixed types, including binary, count, and continuous data. Gibbs sampling is used to estimate model parameters. The algorithm successfully discovers correlation structures of mixed-type data in both simulated and real-word data.Item Exploring multiple perspectives to mitigate cognitive biases through an integrated interface to language models(2024-05) Wong, Yian ; Lease, Matthew A.; Li, Junyi JessyIn recent years, large language models (LLMs) have demonstrated remarkable abilities in generating human-like text and supporting decision-making processes. However, their use is often limited by inherent biases and a lack of diversity in presented perspectives. This work introduces a novel system designed to mitigate these issues by leveraging the capabilities of LLMs to simulate a multi-perspective debate format, aimed at providing a balanced view on controversial topics. The proposed system employs a unique integrated interface that facilitates dynamic interactions between multiple AI-generated personas, each representing distinct viewpoints. These personas engage in structured debates, allowing for a comprehensive exploration of a topic that counteracts the cognitive biases typically associated with single-perspective information retrieval systems. The system incorporates advanced prompt engineering techniques and retrieval-augmented generation to ensure the accuracy and relevance of the information presented. Additionally, the interface is designed with user engagement in mind, featuring interactive elements that allow users to manipulate the debate dynamics and contribute to the discussion. This thesis evaluates the system’s effectiveness in enhancing users' understanding of complex issues and its potential in reducing bias in decision support systems. By simulating diverse viewpoints, the system potentially fosters more critical and informed engagement with topics, thus supporting better decision-making.Item Global models for temporal relation classification(2008-12) Ponvert, Elias Franchot; Baldridge, JasonTemporal relation classification is one of the most challenging areas of natural language processing. Advances in this area have direct relevance to improving practical applications, such as question-answering and summarization systems, as well as informing theoretical understanding of temporal meaning realization in language. With the development of annotated textual materials, this domain is now accessible to empirical machine-learning oriented approaches, where systems treat temporal relation processing as a classification problem: i.e. a decision as per which label (before, after, identity, etc) to assign to a pair (i, j) of event indices in a text. Most reported systems in this new research domain utilize classifiers that make decisions effectively in isolation, without explicitly utilizing the decisions made about other indices in a document. In this work, we present a new strategy for temporal relation classification that utilizes global models of temporal relations in a document, choosing the optimal classification for all pairs of indices in a document subject to global constraints which may be linguistically motivated. We propose and evaluate two applications of global models to temporal semantic processing: joint prediction of situation entities with temporal relations, and temporal relations prediction guided by global coherence constraints.Item Grounded language learning models for ambiguous supervision(2013-12) Kim, Joo Hyun, active 2013; Mooney, Raymond J. (Raymond Joseph)Communicating with natural language interfaces is a long-standing, ultimate goal for artificial intelligence (AI) agents to pursue, eventually. One core issue toward this goal is "grounded" language learning, a process of learning the semantics of natural language with respect to relevant perceptual inputs. In order to ground the meanings of language in a real world situation, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR). In this thesis, we focus on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical MRs. We present probabilistic generative models for learning such correspondences along with a reranking model to improve the performance further. First, we present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map an NL sentence to its corresponding MR form depicted in a single tree structure. We perform evaluations on the RoboCup sportscasting corpus, proving that our model is more effective than those proposed by previous researchers. Next, we describe two PCFG induction models for grounded language learning that extend the previous grounded language learning model of Börschinger, Jones, and Johnson (2011). Börschinger et al.’s approach works well in situations of limited ambiguity, such as in the sportscasting task. However, it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task first studied by Chen and Mooney (2011). The two models we present overcome such limitations by employing a learned semantic lexicon as a basic correspondence unit between NL and MR for PCFG rule generation. Finally, we present a method of adapting discriminative reranking to grounded language learning in order to improve the performance of our proposed generative models. Although such generative models are easy to implement and are intuitive, it is not always the case that generative models perform best, since they are maximizing the joint probability of data and model, rather than directly maximizing conditional probability. Because we do not have gold-standard references for training a secondary conditional reranker, we incorporate weak supervision of evaluations against the perceptual world during the process of improving model performance. All these approaches are evaluated on the two publicly available domains that have been actively used in many other grounded language learning studies. Our methods demonstrate consistently improved performance over those of previous studies in the domains with different languages; this proves that our methods are language-independent and can be generally applied to other grounded learning problems as well. Further possible applications of the presented approaches include summarized machine translation tasks and learning from real perception data assisted by computer vision and robotics.Item Identifying lexical relationships and entailments with distributional semantics(2017-06-19) Roller, Stephen Creig; Erk, Katrin; Mooney, Raymond J; Miikkulainen, Risto; Padó, SebastianMany modern efforts in Natural Language Understanding depend on rich and powerful semantic representations of words. Systems for sophisticated logical and textual reasoning often depend heavily on lexical resources to provide critical information about relationships between words, but these lexical resources are expensive to create and maintain, and are never fully comprehensive. Distributional Semantics has long offered methods for automatically inducing meaning representations from large corpora, with little or no annotation efforts. The resulting representations are valuable proxies of semantic similarity, but simply knowing two words are similar cannot tell us their relationship, or whether one entails the other. In this thesis, we consider how methods from Distributional Semantics may be applied to the difficult task of lexical entailment, where one must predict whether one word implies another. We approach this by showing contributions in areas of hypernymy detection, lexical relationship prediction, lexical substitution, and textual entailment. We propose novel experimental setups, models, analysis, and interpretations, which ultimate provide us with a better understanding of both the nature of lexical entailment, as well as the information available within distributional representations.Item In-process diagnostic methods for entity representation learning on sequential data at scale(2022-08-05) Garcia-Olano, Diego; Ghosh, Joydeep; Wallace, Byron; Dimakis, Alex; Wang, Atlas; Vikalo, HarisThe performance gains and expanded utilization of deep learning models in the fields of machine learning and natural language processing have been followed by a need for the internal mechanisms guiding them to be explainable and accompanied by methods allowing humans to diagnose and correct such models at inference time if needed. In contrast to post-hoc methods for explainability that train a secondary model to infer the decision reasoning of a primary model by using only its inputs and outputs, in-process methods offer faithful explanations of a model’s decisions by explicitly training the model to include such capabilities as an additional objective rather than trying to infer them in a post-hoc manner. Such methods should scale without sacrificing model performance and be sufficiently broad enough to incorporate diverse tasks and data types including sequential language, time series, and multi-modal data. Of particular interest is the analysis of such techniques for the learning of rich dense or sparse interpretable entity representations tied to knowledge bases. In this thesis we try to address these aims by developing efficient frameworks that handle different data types and provide diverse, in-process explainable techniques for transparent and trustworthy models. First, we show that it is feasible to learn dense entity representations from text via a dual encoder framework that encodes mentions and entities in the same dense vector space. Such representations can then be used for extremely fast entity linking where candidate entities are retrieved by approximate nearest neighbor search and generalize well to new datasets. During training the model leverages a novel negative mining algorithm which guides learning by iteratively constructing training batches to contain top candidates that were previously incorrectly ranked above the true entity. The technique dramatically improves model accuracy over iterations and the final batches can be viewed as samples most difficult for the model to learn. We then introduce a framework for learning in-process prototypes from an autoencoder that provides both instance-level and global explanations for time series classification. We explicitly optimize for increased prototype diversity which improves model accuracy and produces prototypes generated by learning regions of the latent space that highlight features the model uses for distinguishing amongst classes. We show that the prototypes are capable of learning real-world features - in our case-study ECG morphology related to bradycardia. Next we derive Biomedical Interpretable Entity Representations (BIER) in which dimensions correspond to fine-grained entity types, and values are predicted probabilities that a given entity is of the corresponding type. We propose a diagnostic method that exploits BIER’s final sparse and intermediate dense representations to facilitate model and entity type debugging and show BIERs achieve strong performance in biomedical tasks including named entity disambiguation and entity linking. We next propose a method for entity-based knowledge injection for the multimodal Knowledge-Based Visual Question Answering (KBVQA) task, which contains questions whose answers explicitly require external knowledge about named entities within an image, and study how it affects both task accuracy and an existing inprocess, bi-modal explainability technique. Our results show substantially improved performance on the KBVQA task without the need for additional costly pre-training, and we provide insights for when entity knowledge injection helps improve a model’s understanding. Finally, we introduce Intermediate enTity-based Sparse Interpretable Representation Learning (ItsIRL), an architecture that allows for fine-tuning of sparse, interpretable entity representations (IERs) on downstream tasks while preserving the semantics of the dimensions learned during pretraining. This approach surpasses prior IERs work and realizes competitive performance with dense models on biomedical tasks. We propose and study ‘counterfactual’ entity type manipulation techniques made possible by our architecture that allows fixing of ItsIRL errors that can surpass performance against dense non- interpretable models. Additionally, we propose a method to construct entity type based class prototypes for showing global semantic properties learned by our model, both for positive and negative instances.Item Inducing grammars from linguistic universals and realistic amounts of supervision(2015-05) Garrette, Daniel Hunter; Baldridge, Jason; Mooney, Raymond J. (Raymond Joseph); Ravikumar, Pradeep; Scott, James G; Smith, Noah AThe best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Item Introducing controlled reasoning into autoregressive large language models(2023-05) Mersinias, Michail; Li, Junyi Jessy; Mahowald, KyleIn this thesis, we explore two ways in order to enhance and optimize the text generation process of autoregressive large language models (LLMs), in particular those with a generative pre-trained transformer (GPT) architecture which we categorize into GPT and InstructGPT model types. In both cases, our proposed methods attempt to replicate human cognitive behavior and introduce System 2 (controlled) reasoning into the text generation process. For GPT models, we explore incorporating natural language inference (NLI) into the text generative pipeline by using a pre-trained NLI model to assess whether a generated sentence entails, contradicts, or is neutral to the prompt and preceding text. First, we show that the NLI task is predictive of generation errors 6 made by GPT-3. We use these results to develop an NLI-informed generation procedure for GPT-J. Then, we evaluate these generations by obtaining human annotations on error types and overall quality. We demonstrate that an NLI strategy of maximizing the neutral class provides the highest quality of generated text, significantly better than the vanilla generations, regardless of nucleus sampling parameter value. For InstructGPT models, we propose constant interaction between two separate instances of the same model: the generator and the critic. We train the critic using Scarecrow, a framework for machine text evaluation which defines ten generation error types. We explore different training procedures and demonstrate that a critic trained with two examples for each error type, as well as chain-of-thought, is highly predictive of generation errors. The critic provides feedback regarding the location, the reason and the type of each detected error within the generated text. We conclude that using this feedback, the generator has the potential to correct its own errors and produce text of higher quality.
- «
- 1 (current)
- 2
- 3
- »