Predicting Subcellular Locations of Conserved Eukaryotic Protein Families with Co-Fractionation Mass Spectrometry Data
Abstract
Classifying proteins by their subcellular locations is important for gaining insight into their functions and understanding the dynamics of a cell. While some proteins of eukaryotic organisms such as humans or mice are well characterized, many conserved proteins across the entire Eukaryota domain remain uncharacterized. One recent development for detecting the physical association of proteins is co-fractionation mass spectrometry (CFMS), a method that involves multiple separations of proteins based on their physical and biochemical properties to then be identified by mass spectrometry. We utilized data from previous CFMS experiments of 31 eukaryotic organisms to build a machine learning model to predict the subcellular locations of conserved protein families. This model uses the elution profiles generated from CFMS as features and subcellular location annotations from QuickGO as truth labels. We used our trained model to predict subcellular locations of protein families that have been identified by CFMS but do not have annotated subcellular locations. Our results demonstrate that CFMS data is acceptable at distinguishing subcellular locations of deeply conserved protein families and is exceptional at distinguishing between ribosomal and non-ribosomal proteins.