Exploring protein biochemistry with deep learning

dc.contributor.advisorWilke, C. (Claus)
dc.contributor.committeeMemberDavies, Bryan W.
dc.contributor.committeeMemberKlivans, Adam R.
dc.contributor.committeeMemberRussell, Rick
dc.creatorKulikova, Anastasiya Vitalievna
dc.date.accessioned2024-04-18T00:36:22Z
dc.date.available2024-04-18T00:36:22Z
dc.date.issued2023-12
dc.date.submittedDecember 2023
dc.date.updated2024-04-18T00:36:23Z
dc.description.abstractDeep learning has become widely used in biological sciences. More specifically, the development of protein deep learning models has leveraged the evergrowing collection of biological data to learn the patterns that govern protein biochemistry. Here, we focus on the assessment of different protein deep learning models to better understand each of their capabilities, benefits and drawbacks. Our work aims to provide insights for future protein engineering efforts and for the discovery of protein homologs. In Chapter 2 we assessed a structure-based protein ML model in its ability to make biochemically meaningful predictions and tested weather or not the model can predict specific allowed amino acids in a protein. We compared the performance of models trained on different input sizes and correlated model predictions with natural variation in order to better understand how these models learn protein structure and biochemistry. In Chapter 3, we compared the predictions of two structure models and two language models to determine if different protein representations affect what information each model type learns and their performance. Finally, in Chapter 4, we apply a sequence-based protein model to searching for antibacterial microcin peptides in bacterial genomes.
dc.description.departmentCellular and Molecular Biology
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/2152/124869
dc.identifier.urihttps://doi.org/10.26153/tsw/51471
dc.language.isoen
dc.subjectConvolutional neural networks
dc.subjectLarge language models
dc.subjectMicrocins
dc.subjectTransformers
dc.subjectMachine learning
dc.subjectProteins
dc.subjectAntimicrobial peptides
dc.subjectProtein engineering
dc.subjectSemantic search
dc.subjectHomology search
dc.subjectProtein biochemistry
dc.subjectBiochemistry
dc.titleExploring protein biochemistry with deep learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentCellular and Molecular Biology
thesis.degree.disciplineCell and Molecular Biology
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.nameDoctor of Philosophy
thesis.degree.programCell and Molecular Biology

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KULIKOVA-PRIMARY-2024-1.pdf
Size:
8.72 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.46 KB
Format:
Plain Text
Description: