Improving methods of evolutionary inference
Our understanding of the natural world is largely dependent on our ability to extract information from biological sequence data with computational models. Building useful models is an iterative process between understanding the biological systems of interest and the natural variation they produce, and assessing a given model's ability to characterize those observations. We address various aspects of the modeling process in three unique application areas that utilize biological sequence data, seeking to either improve or assess the performance of currently available models. In Chapter 2, we characterize diversity at individual sites in protein sequence alignments and present a low dimensional representation for use in evolutionary models. We assess the performance of a deep learning model (relative to conventional statistical models) for estimating population recombination rate and identify fundamental information limits in Chapter 3. In Chapter 4, we investigate the influence of environmental factors on the well-established gene expression-codon usage bias paradigm and highlight the importance of considering growth rate in future models. Together, this work presents potential improvements to evolutionary models in three unique contexts.