Browsing by Subject "Molecular evolution"
Now showing 1 - 15 of 15
- Results Per Page
- Sort Options
Item Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence(Public Library of Science, 2008-11-28) Kim, Wan Kyu; Marcotte, Edward M.Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The key ideas are (1) interaction probability increases with availability of unoccupied interaction surface, thus following an anti-preferential attachment rule, (2) as a network grows, highly connected sub-networks emerge into protein modules or complexes, and (3) once a new protein is committed to a module, further connections tend to be localized within that module. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution.Item The Ascent of the Abundant: How Mutational Networks Constrain Evolution(Public Library of Science, 2008-07-18) Cowperthwaite, Matthew C.; Economo, Evan P.; Harcombe, William R.; Miller, Eric L.; Meyers, Lauren AncelEvolution by natural selection is fundamentally shaped by the fitness landscapes in which it occurs. Yet fitness landscapes are vast and complex, and thus we know relatively little about the long-range constraints they impose on evolutionary dynamics. Here, we exhaustively survey the structural landscapes of RNA molecules of lengths 12 to 18 nucleotides, and develop a network model to describe the relationship between sequence and structure. We find that phenotype abundance—the number of genotypes producing a particular phenotype—varies in a predictable manner and critically influences evolutionary dynamics. A study of naturally occurring functional RNA molecules using a new structural statistic suggests that these molecules are biased toward abundant phenotypes. This supports an “ascent of the abundant” hypothesis, in which evolution yields abundant phenotypes even when they are not the most fit.Item Behavior and limitations of molecular and population-level models of evolution(2019-05-08) Sydykova, Dariya; Wilke, C. (Claus); Kirkpatrick, Mark A; Mueller, Peter; Matouschek, Andreas T; Hofmann, Johann ANumerous models of molecular evolution have been developed over the years. Our knowledge of these models, while extensive in some aspects, is limited in others. Often models were developed independently of other models resulting in a lack of an understanding about how one model's parameters relate to the parameters in a different model. For example, some models work with codon sequences and others work with amino acid sequences. How the site-wise rates in an amino acid based model translate to the site-wise rates in a codon based model is unknown. It is also unclear how different parameters relate to the underlying evolutionary processes. For example, what physical quantities are measured by the site-wise rate of evolution is unclear. In Chapter 1, I introduce the concepts of modeling evolution and the principles of the inference of site-specific rates of evolution in a protein coding sequence. In Chapter 2, I determine the relationship between amino acid models and codon models of evolution. This relationship demonstrates how site-specific rates from a codon model relate to an amino acid model, and the deviations between the two estimates. This relationship also benchmarks empirical amino acid models against more mechanistic codon models. In Chapter 3, I establish how model choice affects maximum likelihood inference of site-specific rate of evolution. I analytically derive the site-wise rate under the assumptions of different substitution models. For the models where an analytically derived solution is not possible, I assess their effect on site-wise rates numerically. In Chapter 4, I demonstrate that when mutation can change epistatic interactions, small populations tend to evolve towards one of the two fixed points of epistasis. Our theory demonstrates that these points can be identified with mutation or drift robustness. Finally, in Chapter 5, I place these results in context, and I discuss future avenues to improve current models of evolution.Item Bringing Molecules Back into Molecular Evolution(Public Library of Science, 2012-06-28) Wilke, Claus O.Much molecular-evolution research is concerned with sequence analysis. Yet these sequences represent real, three-dimensional molecules with complex structure and function. Here I highlight a growing trend in the field to incorporate molecular structure and function into computational molecular-evolution work. I consider three focus areas: reconstruction and analysis of past evolutionary events, such as phylogenetic inference or methods to infer selection pressures; development of toy models and simulations to identify fundamental principles of molecular evolution; and atom-level, highly realistic computational modeling of molecular structure and function aimed at making predictions about possible future evolutionary events.Item Evolution of structure-function relationships in the GFP-family of proteins(2014-08) Modi, Chintan Kishore; Matz, Mikhail V.One of the most intriguing questions in evolutionary biology is how biochemical and structural complexity arise through small and incremental changes; however answering this question requires an explicit set of candidate residues and an experimental system in which to test them. This dissertation aims to understand how biochemical complexity evolves and assesses the structure-function relationship in the green fluorescent protein (GFP) protein family using an ancestral reconstruction approach. In the second chapter, I studied the evolution of biochemical complexity in Kaede-type red fluorescent proteins (FPs) from Faviina corals. An increase in biochemical complexity is represented by the emergence of red fluorescence because it necessitates the synthesis of a tri-cyclic chromophore from a precursor bi-cyclic chromophore through an additional autocatalytic reaction step. The autocatalytic reaction is fully enabled by as many as twelve historical mutations. Here, I showed that the red fluorescent chromophore evolved from an ancestral green chromophore by perturbing the ancestral protein stability at multiple levels of protein structure. Moreover, only three historical mutations are sufficient to initiate the selection-accessible evolutionary trajectory leading to emergence of red fluorescence. The third chapter investigates six mutations proximate to the chromophore in the Kaede-type FP that could have facilitated autocatalytic synthesis of the red chromophore by enlarging the chromophore-containing cavity and modifying its microenvironment. Two of these six mutations were found to strongly affect the protein’s stability and oligomeric tendency. Additionally, I showed that the dimeric least divergent Kaede-type FP, R1-2, evolved from the tetrameric green ancestor. Taken together the results of these studies indicate that the step-up in biochemical complexity in the Kaede-type FPs was achieved via disruption of the existing stable interactions at tertiary and quaternary protein structure levels. In the fourth chapter, I resurrected the common ancestor of all FPs cloned from the order Leptothecata (class Hydrozoa), which are characterized by the highest known homo-oligomeric diversity. I showed that the ancestor was a green monomeric FP with a large Stokes shift. The ancestral FP together with the extant Leptothecata FPs could server as a model system to study the evolution of function and homo-oligomerization, and the desirable photophysical characteristics would make this ancestral FP a useful bio-marker in bio-medical research.Item Evolutionary and functional analyses of primate genes reveal critical host-virus interactions(2014-12) Meyerson, Nicholas Ryan; Sawyer, Sara L.; Krug, Robert M; Dudley, Jaquelin P; Bull, James J; Ehrlich, Lauren IRViruses exert a tremendous evolutionary pressure on their hosts. By hijacking cellular machinery and resources, viruses have been wildly successful at infecting and propagating throughout all domains of life. In the following dissertation, the interactions between primates and some of the viruses that infect them are examined through an evolutionary lens. I begin by introducing the long-standing battle between mammals and viruses that has raged on for hundreds of millions of years. I propose a theoretical framework to understand how slowly evolving mammals are able to keep pace with rapidly evolving viruses, and how we might use this framework to monitor future virus outbreaks. The core of my analyses stems from an evolutionary concept known as the host-virus arms race. This tug-of-war for survival between hosts and viruses leaves an imprint in the DNA of each organism involved that can be detected using statistical analyses. In Chapter 2, I describe these analyses in great detail and perform many tests to ensure that they are being used and applied appropriately. The remainder of my studies focuses on detecting novel signatures of positive selection in primate genes that are likely caused by ancient host-virus arms races. I characterize the evolutionary history of several primate genes that have been implicated in viral life cycles and provide functional evidence that viruses drove their rapid divergence. In doing so I make three important discoveries. First, I characterize a genetic variant of CD4, the cellular receptor for HIV-1, in an owl monkey species that could make them a viable HIV-1 model system. Second, I show that gorilla-specific mutations in RANBP2, a gatekeeper of the cell nucleus, can inhibit HIV-1 infection. And finally, evolutionary signatures in TRIM25, a component of the innate immune system, revealed its ability to inhibit influenza A virus replication by binding incoming viral ribonucleoproteins.Item High throughput directed enzyme evolution using fluorescence activated cell sorting(2003-05) Olsen, Mark Jon; Iverson, Brent L.; Georgiou, GeorgeItem Investigating the behaviors and limitations of phylogenetic models of protein-coding sequence evolution(2016-05) Spielman, Stephanie Jill; Wilke, C. (Claus); Bull, James; Barrick, Jeffrey; Hillis, David; Hofmann, HansProbabilistic models which infer the strength and direction of natural selection from protein-coding sequences are among the most widely-used tools in comparative sequence analysis. A variety of phylogenetic models of coding-sequence evolution have been developed. However, these models have been produced independently from one another. As a consequence, it has been entirely unknown whether inferences from different models reveal similar or incompatible information about the evolutionary process. In this dissertation, I derive and study the mathematical relationship between two probabilistic models of protein-coding sequence evolution: dN/dS-based models, which estimate evolutionary rates, and mutation–selection models, which estimate site-specific amino-acid fitnesses. I demonstrate how this relationship reveals the behavioral properties, limitations, and applicabilities of different inference frameworks, which leads to concrete recommendations for how these models should best be employed in evolutionary sequence analysis. In Chapter 2, I develop a flexible and extendable software, implemented as a module in the Python programming language, for simulating sequences along phylogenies according to standard evolutionary models. This software platform provides an independent and user-friendly platform for testing model behavior, or indeed developing novel evolutionary models, thus enabling robust comparisons of modeling frameworks. In Chapter 3, I derive a mathematical relationship between dN/dS and amino-acid fitness values, and I show that mutation– selection models fully encompass information encoded in dN/dS models, provided that sequences are evolving under purifying selection. I further use this relationship to show that certain commonly-used dN/dS-based models are strongly and systematically biased. I additionally show that standard metrics used for model selection in phylogenetics (e.g. Akaike Information Criterion) may be positively misleading and indicate strong support for incorrect models. Finally, in Chapter 4, I apply the mathematical relationship developed in Chapter 3 to study the accuracy of two competing mutation–selection inference implementations, whose relative merits have been heavily debated in the literature. My approach demonstrates that mutation–selection inference platforms that treat amino-acid fitnesses as fixed-effect variables precisely estimate site-specific evolutionary constraints. By contrast, inference platforms that treat fitnesses as random-effect variables systematically underestimate the strength of natural selection across sites. Taken together, the work presented in this dissertation yields novel insights into how these popular evolutionary models can best be applied to sequence data, how their results should be interpreted, and finally how future model development should be conducted in order to yield robust and reliable inference methods.Item Ion channels and the tree of life(2014-12) Liebeskind, Benjamin Joseph; Zakon, Harold; Hillis, David M., 1958-; Aldrich, Richard W.; Hofmann, Hans A.; Matz, Mikhail V.The field of comparative neurobiology has deep roots. I will begin by giving an overview of the parts of its history that I feel are most relevant for this dissertation. Within this history lies a wealth of zoological research and penetrating theories that are underutilized by modern evolutionary biologists. The age of whole-genome sequencing provides a perfect opportunity to revisit and perhaps update this corpus to better understand the phylogenetic history of organismal behavior. The first three chapters of my dissertation will be case studies on the evolution of sodium-selective ion channels. Sodium channels are responsible for much of the electrical signaling in animal nervous systems and muscles, but their evolutionary relationships have not yet been explored with the modern tools of phylogenetics and comparative genomics. Chapter 1 will deal with the classic Nav channels which create action potentials in nerves and muscles. There I will show that this gene family pre-dates the nervous system and even animal multicellularity. Chapter two will investigate sodium leak channels, which likley create the leak conductance measured by Hodgkin and Huxley. These channels turn out to be close relatives of fungal calcium channels, a relationship which illuminates the evolution of both groups. Chapter three is on bacterial sodium channels and their use as models for other sodium channel types. The final chapter will turn away from sodium channels in particular and discuss the evolution of animal nervous systems by means of ion channel genomics. In that chapter I will show that the genomic complements of ion channels that animals with nervous systems possess evolved independently to large degree, and that the early evolution of nervous systems also involved periods of gene loss. I will end with a more general discussion of convergent evolution, a key theme of this dissertation, and its effect on comparative analyses in the age of genomics.Item Matrix and tensor decomposition methods as tools to understanding sequence-structure relationships in sequence alignments(2010-12) Muralidhara, Chaitanya; Alter, Orly, 1964-; Gutell, RobinWe describe the use of a tensor mode-1 higher-order singular value decomposition (HOSVD) in the analyses of alignments of 16S and 23S ribosomal RNA (rRNA) sequences, each encoded in a cuboid of frequencies of nucleotides across positions and organisms. This mode-1 HOSVD separates the data cuboids into combinations of patterns of nucleotide frequency variation across the positions and organisms, i.e., "eigenorganisms"' and corresponding nucleotide-specific segments of "eigenpositions," respectively, independent of a-priori knowledge of the taxonomic groups and their relationships, or the rRNA structures. We show that this mode-1 HOSVD provides a mathematical framework for modeling the sequence alignments where the mathematical variables, i.e., the significant eigenpositions and eigenorganisms, are consistent with current biological understanding of the 16S and 23S rRNAs. First, the significant eigenpositions identify multiple relations of similarity and dissimilarity among the taxonomic groups, some known and some previously unknown. Second, the corresponding eigenorganisms identify positions of nucleotides exclusively conserved within the corresponding taxonomic groups, but not among them, that map out entire substructures inserted or deleted within one taxonomic group relative to another. These positions are also enriched in adenosines that are unpaired in the rRNA secondary structure, the majority of which participate in tertiary structure interactions, and some also map to the same substructures. This demonstrates that an organism's evolutionary pathway is correlated and possibly also causally coordinated with insertions or deletions of entire rRNA substructures and unpaired adenosines, i.e., structural motifs which are involved in rRNA folding and function. Third, this mode-1 HOSVD reveals two previously unknown subgenic relationships of convergence and divergence between the Archaea and Microsporidia, that might correspond to two evolutionary pathways, in both the 16S and 23S rRNA alignments. This demonstrates that even on the level of a single rRNA molecule, an organism's evolutionary pathway is composed of different types of changes in structure in reaction to multiple concurrent evolutionary forces.Item Plastid genome rearrangement, gene loss, and sequence divergence in geraniaceae, passifloraceae, and annonaceae.(2013-12) Blazier, John Christensen; Jansen, Robert K., 1954-Plastid genomes of flowering plants are largely identical in gene order and content, but a few lineages have been identified with many gene and intron losses, genomic rearrangements, and accelerated rates of nucleotide substitutions. These aberrant lineages present an opportunity to understand the modes of selection acting on these genomes as well as their long-term stability. My research has focused on two areas within plastid genome evolution in Geraniaceae: first, an investigation of the diversity of unusual plastid genomes in a single genus, Erodium (Geraniaceae) for chapters one and three. Chapter two focuses on the evolution of subunits of the plastid-encoded RNA polymerase (PEP). The first chapter described the loss of plastid-encoded NADPH dehydrogenase (ndh) genes from a clade of 13 Erodium species. Divergence time estimates indicate this clade is less than 5 million years old. This recent loss of ndh genes in Erodium presents an opportunity to investigate changes in photosynthetic function through comparative biochemistry between Erodium species with and without plastid-encoded ndh genes. Second, I examined the evolution of the gene encoding the alpha subunit (rpoA) of PEP in three disparate angiosperm lineages—Pelargonium (Geraniaceae), Passiflora (Passifloraceae), and Annonaceae—in which this gene has diverged so greatly that it is barely recognizable. PEP is conserved in the plastid genomes of all photosynthetic angiosperms. I found multiple lines of evidence indicating that the genes remain functional despite retaining only ~30% sequence identity with rpoA genes from outgroups. The genomes containing these divergent rpoA genes have undergone significant rearrangement due to illegitimate recombination and gene conversion, and I hypothesized that these phenomena have also driven the divergence of rpoA. Third, I conducted a survey of plastid genome evolution in Erodium with the completion of 15 additional whole genomes. Except for Erodium and some legumes, all angiosperm plastid genomes share a quadripartite structure with large and small single copy regions (LSC, SSC) and two inverted repeats (IR). I discovered a species of Erodium that has re-formed a large inverted repeat. Demonstrating a precedent for loss and regain of the IR also impacts models of evolution for other highly rearranged plastid genomes.Item Rapid Evolution of Coral Proteins Responsible for Interaction with the Environment(Public Library of Science, 2011-05-25) Voolstra, Christian R.; Sunagawa, Shinichi; Matz, Mikhail V.; Bayer, Till; Aranda, Manuel; Buschiazzo, Emmanuel; DeSalvo, Michael K.; Lindquist, Erika; Szmant, Alina M.; Coffroth, Mary Alice; Medina, MónicaBackground -- Corals worldwide are in decline due to climate change effects (e.g., rising seawater temperatures), pollution, and exploitation. The ability of corals to cope with these stressors in the long run depends on the evolvability of the underlying genetic networks and proteins, which remain largely unknown. A genome-wide scan for positively selected genes between related coral species can help to narrow down the search space considerably. Methodology/Principal Findings -- We screened a set of 2,604 putative orthologs from EST-based sequence datasets of the coral species Acropora millepora and Acropora palmata to determine the fraction and identity of proteins that may experience adaptive evolution. 7% of the orthologs show elevated rates of evolution. Taxonomically-restricted (i.e. lineage-specific) genes show a positive selection signature more frequently than genes that are found across many animal phyla. The class of proteins that displayed elevated evolutionary rates was significantly enriched for proteins involved in immunity and defense, reproduction, and sensory perception. We also found elevated rates of evolution in several other functional groups such as management of membrane vesicles, transmembrane transport of ions and organic molecules, cell adhesion, and oxidative stress response. Proteins in these processes might be related to the endosymbiotic relationship corals maintain with dinoflagellates in the genus Symbiodinium. Conclusion/Relevance -- This study provides a birds-eye view of the processes potentially underlying coral adaptation, which will serve as a foundation for future work to elucidate the rates, patterns, and mechanisms of corals' evolutionary response to global climate change.Item The role of structure in protein evolution(2014-12) Meyer, Austin Garig; Wilke, C. (Claus)Identifying sites under evolutionary pressure and predicting the effects of substitutions at those sites are among the greatest standing problems in bioinformatics and computational biology. Moreover, the two problems have traditionally been separated by the enormous chasm that exists between molecular evolutionary biologists interested in the evolutionary process and theoretical chemists interested in free energy changes. As a result, identifying sites under selective pressure has most often left out any semblance of structural biology and biochemistry; likewise, theoretical chemistry tends to rely strictly on first principles calculations rather than thinking first about biologically simple and interpretable results. Here, I have tried to integrate these two intuitions with regard to protein function and evolution. First, I developed a model that implements structural measurements into a traditional structure-blind molecular evolutionary model. This structure-aware model performs significantly better at identifying sites under both purifying and diversifying selection than its structure-blind counter part. Second, I go further to understand the extent to which structural features of any kind can predict the evolutionary process. By comparing site-wise evolution between human and avian influenza, I find that structural features can account for 24% to 36% of the evolutionary pressure on influenza hemagglutinin. Third, I developed a computational method based on first principles molecular dynamics simulations to predict the biological effect of substitutions in the Machupo virus--Human receptor protein--protein interface. I found that relatively simple energetic proxies offer a reasonable substitute for rigorous free energy calculations; such simple proxies could allow non-experts to naively implement first principles methods without being forced to consider all possible degrees of freedom for post hoc calculations.Item A snapshot of the unity and diversity of biological systems at the level of chemistry : structural and mechanistic studies of Cg10062, a homologue of cis-3-chloroacrylic acid dehalogenase, FG41 malonate semialdehyde decarboxylase and the catalytic domain of pyruvate dehydrogenase phosphatase 1(2010-05) Guo, Youzhong, 1974-; Hackert, Marvin L.; Whitman, Christian P.; Zhang, Zhiwen; Fast, Walter L.; Liu, Hung-wenThe tautomerase superfamily is composed of a group of proteins characterized by two key features: the N-terminal proline and a beta-alpha-beta-motif. This superfamily has been divided into five families represented by 4-oxalocrotonate tautomerase (4-OT), 5-(carboxymethyl)-2-hydroxymuconate isomerase (CHMI), cis-3-chloroacrylic acid dehalogenase (cis-CaaD), malonate semialdehyde decarboxylase (MSAD), and macrophage migration inhibitory factor (MIF). Cg10062 is a homologue of cis-CaaD, but has several distinct biochemical properties from cis-CaaD. For example, Cg10062 can be irreversibly inhibited by (R)- or (S)-oxirane-2-carboxylate, whereas cis-CaaD can only be irreversibly inhibited by (R)-oxirane-2-carboxylate. FG41MSAD is a homologue of MSAD, with comparable decarboxylase activity but missing Arg-73 known to be crucial for the MSAD activity. In order to understand the unique biochemical characteristics of Cg10062 and FG41MSAD, we have solved five crystal structures. These crystal structures have established a solid structural basis for understanding the mechanisms of their activities. The eukaryotic protein phosphatases are composed of a group of proteins that are responsible for reversible phosphorylation. The eukaryotic protein phosphatases have been divided into three families, the phosphoprotein phosphatase (PPP) family, the protein phosphatase Mg2+- or Mn2+-dependent (PPM) family and the protein Tyr phosphatase (PTP) family. PDP1 is a member of PPM family. PDP1 is also an important component of the large pyruvate dehydrogenase complex (PDC) which catalyzes the decarboxylation of pyruvate to yield acetyl-CoA with the accompanying reduction of NAD+. In order to understand the mechanism in which it dephosphorylates its target protein we have solved the structure of the catalytic domain of PDP1. Analysis of these structures in the light of their evolutionary contexts enables us to appreciate the unity and diversity of the biological systems at the chemical level and help us solve interesting problems, such as the possible physiological functions for some members within the tautomerase superfamily.Item Towards a predictive functional synthesis : routing top-down and bottom-up approaches in biology(2020-08-13) Cole, Austin Woodrow; Ellington, Andrew D.; Bull, James; Barrick, Jeffrey; Pogue, Gregory; Davies, BryanNothing in evolution makes sense except in the light of biology. The collection of experiments described here attempts to engineer biology to influence evolution — this builds on and inverts an approach to parse historical evolutionary realities through molecular experimentation. Several different avenues are described here to prospectively anticipate the patterns of evolution, and a small selection of informative studies that set the stage for these experiments are presented in chapter one. Chapters two and three detail how engineered codon tables may bias evolution under two different types of selection. These chapters speak to the rigidity of the existing codon table and the broad adaptive potential that exists within the current set of 20 amino acids. Chapters four and five restrict their focus to engineer the genomes of single stranded viral individuals rather than amino acid repertoires, and then lay out how these synthetic genomes adapt in response to designed architectures. Lastly, chapter six describes new tools to amalgamate ‘bottom-up’ chemical information and then uses an algorithm embedding that information to guide the engineering of improved protein phenotypes. Taken together, these approaches form an anthology of strategies that use biological engineering to prospectively bias the adaptation of populations.