Exploring protein biochemistry with deep learning



Journal Title

Journal ISSN

Volume Title



Deep learning has become widely used in biological sciences. More specifically, the development of protein deep learning models has leveraged the evergrowing collection of biological data to learn the patterns that govern protein biochemistry. Here, we focus on the assessment of different protein deep learning models to better understand each of their capabilities, benefits and drawbacks. Our work aims to provide insights for future protein engineering efforts and for the discovery of protein homologs. In Chapter 2 we assessed a structure-based protein ML model in its ability to make biochemically meaningful predictions and tested weather or not the model can predict specific allowed amino acids in a protein. We compared the performance of models trained on different input sizes and correlated model predictions with natural variation in order to better understand how these models learn protein structure and biochemistry. In Chapter 3, we compared the predictions of two structure models and two language models to determine if different protein representations affect what information each model type learns and their performance. Finally, in Chapter 4, we apply a sequence-based protein model to searching for antibacterial microcin peptides in bacterial genomes.


LCSH Subject Headings