Probabilistic language models with model efficiency and data efficiency



Journal Title

Journal ISSN

Volume Title



Probabilistic language models have provided remarkable performance improvements in the domain of natural language processing (NLP). This dissertation presents new approaches addressing three facets of language models from a probabilistic standpoint. Initially, the focus lies on attention-based mechanisms intrinsic to transformer architectures. With the advent of the self-attention transformer, attention mechanisms have laid the groundwork for numerous cutting-edge models. A central proposition herein is the alignment attention, aiming to regularize the query and key projection matrices within each self-attention layer. Next, despite the impressive performance of expansive language models across a range of applications, they typically demand vast datasets. In this study, we suggest an active learning method to label samples using an acquisition function with local sensitivity and learning difficulty. This method creates data replicas via subtle perturbations, prioritizing those data points that show the most distinct predictive likelihoods compared to their replicas. Lastly, we turn our attention to improving inference efficiency. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. Automatically making decisions on where to skip and how to balance quality and computation cost with constrained optimization, our dynamic neural generation networks enforce the efficient inference path and determine the optimized trade-off. All three strategies from the above consistently surpass established benchmarks in performance.



LCSH Subject Headings