Browsing by Subject "Protein complexes"

Now showing 1 - 6 of 6

All-by-all discovery of conserved protein complexes by deep proteome fractionation
(2016-08) Borgeson, Blake Charles; Marcotte, Edward M.; Ellington, Andrew; Wallingford, John; Wilke, Claus; Iyer, Vishwanath
Stable assemblies of proteins, known as protein complexes, execute a large fraction of cellular processes required to sustain life. A functional and mechanistic understanding of these assemblies will provide a more comprehensive understanding of an organisms genes and elucidate a more complete picture of cellular processes, particularly those involved in development, aging and disease. While recent progress has mapped protein complexes in budding yeast and some bacteria, efforts in animals are restricted to subsets of the proteome, leaving most animal protein complexes undetermined. Co-fractionation offers compelling efficiency gains in identifying pairwise protein interactions and complexes, but it requires significant computational efforts to fully exploit. In this work, I describe the computational methods and infrastructure I developed to identify conserved protein interactions and complexes from a massive set of mass spectrometry data from nine species and the computational and biological analysis I performed with my collaborators. These efforts include building a mostly automated pipeline to process and integrate large quantities of mass spectrometry data from multiple species and developing improved methods to predict co-complex interactions and cluster them into complexes. The conserved animal complex map produced using this pipeline and methodology has already yielded dividends in supporting biological discoveries. Scaling the approach more broadly will enable rapid mapping of the previously uncharted interactomes in any chosen species.
Conservation and comparison of protein interactions across evolution
(2020-05) McWhite, Claire Darnell; Marcotte, Edward M.; Browning, Karen; Cambronne, Xiaolu Lulu; Wallingford, John; Wilke, Claus
We share core molecular systems with organisms across the tree of life. Though studying multiple organisms at once, we can find biological signals that rise above experimental noise. Comparative evolutionary analysis is an approach that combines evolutionary and systems biology, in essence, a parallel assay where each organism is a data point. Repeated observation of a feature across multiple organisms increases confidence that the signal reflects biology. However, integrating data from different species adds a layer of complexity to high-throughput analysis. In this dissertation, I will first present a review of the use of comparative evolution in human disease genetics, then a characterization and comparison of different orthology algorithms, and finally the culmination of this work of a high-throughput discovery of stable protein complexes conserved across plants.
Machine learning methods for community detection in networks using known community information
(2022-08-11) Palukuri, Meghana Venkata; Marcotte, Edward M.; Ward, Rachel; Schulz, Karl W; Elber, Ron; Wilke, Claus O
In a network, the problem of community detection refers to finding groups of nodes and edges that form ‘communities’ relevant to the field, such as groups of people with common interests in social networks and fraudulent websites linked to each other on the web. Community detection also yields downstream use-cases such as the summarization of massive networks into smaller networks of communities. We are most interested in mining protein complexes, i.e., communities of interacting proteins, accelerating biological experiments by providing candidates for previously unknown protein complexes. Characterization of protein complexes is important, as they play essential roles in cellular functions and their disruption often leads to disease. Previous methods in community detection comprise a majority of unsupervised graph clustering strategies, which work on the assumption that communities are dense subgraphs in a network - which is not always true. Also, many community detection algorithms are in-memory and serial and do not scale to large networks. In this dissertation, we use knowledge from communities, including rich features from graph nodes, with supervised and reinforcement learning, improving on accuracies, with parallel algorithms ensuring high performance and scalability. Specifically, we work on (1) learning a community fitness function using supervised machine learning methods with AutoML; (2) a distributed algorithm for finding candidate communities using multiple heuristics; (3) learning to walk trajectories on a network leading to communities with reinforcement learning and (4) feature augmentation with graph node information, such as images and additional graph node embeddings. While we optimize our algorithms on protein complexes that have characteristics such as being overlapping in nature with different topologies, our methods are generalizable to other domains since they learn and use characteristics of communities to predict new communities. Further, in domains with limited known information, the algorithms we develop can be applied by transferring learned knowledge such as dense community fitness functions from other domains. In conclusion, we build Super.Complex, RL complex detection, and DeepSLICEM - three accurate, efficient, scalable, and generalizable community detection algorithms, that effectively utilize known community information with different machine learning methods and also present 3 evaluation measures for accurate community evaluation.
Network-based strategies for discovering functional associations of uncharacterized genes and gene sets
(2012-12) Wang, Peggy I.; Marcotte, Edward M.
High-throughput technology is changing the face of research biology, generating an ever growing amount of large-scale data sets. With experiments utilizing next-generation gene sequencing, mass spectrometry, and various other global surveys of proteins, the task of translating the plethora of data into biology has become a daunting task. In response, functional networks have been developed as a means for integrating the data into models of proteomic organization. In these networks, proteins are linked if they are evidenced to operate together in the same function, facilitating predictions about the functions, phenotypes, and disease associations of uncharacterized genes. In this body of work, we explore different applications of this so-called "guilt-by-association" concept to predict loss-of-function phenotypes and diseases associated with genes in yeast, worm, and human. We also scrutinize certain limitations associated with the functional networks, predictive methods, and measures of performance used in our studies. Importantly, the predictive method and performance measure, if not chosen appropriately for the biological objective at hand, can largely distort the results and interpretation of a study. These findings are incorporated in the development of RIDDLE, a method for characterizing whole sets of genes. This machine learning-based method provides a measure of network distance, and thus functional association, between two sets of genes. RIDDLE may be applied to a wide range of potential applications, as we demonstrate with several biological examples, including linking microRNA-450a to ocular development and disease. In the last decade, functional networks have proven to be a useful strategy for interpreting large-scale proteomic and genomic data sets. With the continued growth of genome coverage in networks and the innovation of predictive methods, we will surely advance towards our ultimate goal of understanding the genetic changes that underlie disease.
Scoring functions for protein docking and drug design
(2014-05) Viswanath, Shruthi; Elber, Ron
Predicting the structure of complexes formed by two interacting proteins is an important problem in computation structural biology. Proteins perform many of their functions by binding to other proteins. The structure of protein-protein complexes provides atomic details about protein function and biochemical pathways, and can help in designing drugs that inhibit binding. Docking computationally models the structure of protein-protein complexes, given three-dimensional structures of the individual chains. Protein docking methods have two phases. In the first phase, a comprehensive, coarse search is performed for optimally docked models. In the second refinement and reranking phase, the models from the first phase are refined and reranked, with the expectation of extracting a small set of accurate models from the pool of thousands of models obtained from the first phase. In this thesis, new algorithms are developed for the refinement and reranking phase of docking. New scoring functions, or potentials, that rank models are developed. These potentials are learnt using large-scale machine learning methods based on mathematical programming. The procedure for learning these potentials involves examining hundreds of thousands of correct and incorrect models. In this thesis, hierarchical constraints were introduced into the learning algorithm. First, an atomic potential was developed using this learning procedure. A refinement procedure involving side-chain remodeling and conjugate gradient-based minimization was introduced. The refinement procedure combined with the atomic potential was shown to improve docking accuracy significantly. Second, a hydrogen bond potential, was developed. Molecular dynamics-based sampling combined with the hydrogen bond potential improved docking predictions. Third, mathematical programming compared favorably to SVMs and neural networks in terms of accuracy, training and test time for the task of designing potentials to rank docking models. The methods described in this thesis are implemented in the docking package DOCK/PIERR. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer's disease.R. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer’s disease.
The protein organization of a red blood cell
(2022-05-19) Sae-Lee, Momo Wisath; Marcotte, Edward M.; Wallingford, John; Ippolito, Gregory; Georgiou, George; Matouschek, Andreas; Marcotte, Edward
Red blood cells (RBCs, erythrocytes) are the simplest primary human cells, lacking nuclei and major organelles, and instead employing about a thousand proteins to dynamically control cellular function and morphology in response to physiological cues. In this study, we defined a canonical RBC proteome and interactome using quantitative mass spectrometry and machine learning. Our data reveal an RBC interactome dominated by protein homeostasis, redox biology, cytoskeletal dynamics, and carbon metabolism. We validated protein complexes through electron microscopy and chemical crosslinking, and with these data, built 3D structural models of the ankyrin/Band 3/Band 4.2 complex that bridges the spectrin cytoskeleton to the RBC membrane. The model suggests spring-like compression of ankyrin may contribute to the characteristic RBC cell shape and flexibility. Taken together, our study provides an in-depth view of the global protein organization of human RBCs and serves as a comprehensive resource for future research.