Browsing by Subject "RNA alignment"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Improving the prediction of RNA secondary structure and automatic alignment of RNa sequences(2012-05) Gardner, David Paul; Gutell, Robin; Ren, Pengyu; Browning, Karen; Russell, Rick; Makarov, Dmitrii E.; Miranker, DanielThe accurate prediction of an RNA secondary structure from its sequence will enhance the experimental design and interpretation for the increasing number of scientists that study RNA. While the computer programs that make these predictions have improved, additional improvements are necessary, in particular for larger RNAs. The first major section of this dissertation is concerned with improving the prediction accuracy of RNA secondary structures by generating new energetic parameters and evaluating a new RNA folding model. Statistical potentials for hairpin and internal loops produce significantly higher prediction accuracy when compared with nine other folding programs. While more improvements can be made to the energetic parameters used by secondary structure folding programs, I believe that a new approach is also necessary. I describe a RNA folding model that is predicated on a large body of computational and experimental work. This model includes energetics, contact distance, competition and a folding pathway. Each component of this folding model is evaluated and substantiated for its validity. The statistical potentials were created with comparative analysis. Comparative analysis requires the creation of highly accurate multiple RNA sequence alignments. The second major section of this dissertation is focused on my template-based sequence aligner, CRWAlign. Multiple sequence aligners generally run into problems when the pairwise sequence identity drops too low. By utilizing multiple dimensions of data to establish a profile for each position in a template alignment, CRWAlign is able to align new sequences with high accuracy even for pairs of sequence with low identity.Item RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation(Public Library of Science, 2011-11-21) Liu, Kevin; Linder, C. Randal; Warnow, TandyStatistical methods for phylogeny estimation, especially maximum likelihood (ML), offer high accuracy with excellent theoretical properties. However, RAxML, the current leading method for large-scale ML estimation, can require weeks or longer when used on datasets with thousands of molecular sequences. Faster methods for ML estimation, among them FastTree, have also been developed, but their relative performance to RAxML is not yet fully understood. In this study, we explore the performance with respect to ML score, running time, and topological accuracy, of FastTree and RAxML on thousands of alignments (based on both simulated and biological nucleotide datasets) with up to 27,634 sequences. We find that when RAxML and FastTree are constrained to the same running time, FastTree produces topologically much more accurate trees in almost all cases. We also find that when RAxML is allowed to run to completion, it provides an advantage over FastTree in terms of the ML score, but does not produce substantially more accurate tree topologies. Interestingly, the relative accuracy of trees computed using FastTree and RAxML depends in part on the accuracy of the sequence alignment and dataset size, so that FastTree can be more accurate than RAxML on large datasets with relatively inaccurate alignments. Finally, the running times of RAxML and FastTree are dramatically different, so that when run to completion, RAxML can take several orders of magnitude longer than FastTree to complete. Thus, our study shows that very large phylogenies can be estimated very quickly using FastTree, with little (and in some cases no) degradation in tree accuracy, as compared to RAxML.Item Structural Constraints Identified with Covariation Analysis in Ribosomal RNA(Public Library of Science, 2012-06-19) Shang, Lei; Xu, Weijia; Ozer, Stuart; Gutell, Robin R."Covariation analysis is used to identify those positions with similar patterns of sequence variation in an alignment of RNA sequences. These constraints on the evolution of two positions are usually associated with a base pair in a helix. While mutual information (MI) has been used to accurately predict an RNA secondary structure and a few of its tertiary interactions, early studies revealed that phylogenetic event counting methods are more sensitive and provide extra confidence in the prediction of base pairs. We developed a novel and powerful phylogenetic events counting method (PEC) for quantifying positional covariation with the Gutell lab’s new RNA Comparative Analysis Database (rCAD). The PEC and MI-based methods each identify unique base pairs, and jointly identify many other base pairs. In total, both methods in combination with an N-best and helix-extension strategy identify the maximal number of base pairs. While covariation methods have effectively and accurately predicted RNAs secondary structure, only a few tertiary structure base pairs have been identified. Analysis presented herein and at the Gutell lab’s Comparative RNA Web (CRW) Site reveal that the majority of these latter base pairs do not covary with one another. However, covariation analysis does reveal a weaker although significant covariation between sets of nucleotides that are in proximity in the three-dimensional RNA structure. This reveals that covariation analysis identifies other types of structural constraints beyond the two nucleotides that form a base pair.Item Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA(Public Library of Science, 2011-04-29) Muralidhara, Chaitanya; Gross, Andrew M.; Gutell, Robin R.; Alter, OrlyEvolutionary relationships among organisms are commonly described by using a hierarchy derived from comparisons of ribosomal RNA (rRNA) sequences. We propose that even on the level of a single rRNA molecule, an organism's evolution is composed of multiple pathways due to concurrent forces that act independently upon different rRNA degrees of freedom. Relationships among organisms are then compositions of coexisting pathway-dependent similarities and dissimilarities, which cannot be described by a single hierarchy. We computationally test this hypothesis in comparative analyses of 16S and 23S rRNA sequence alignments by using a tensor decomposition, i.e., a framework for modeling composite data. Each alignment is encoded in a cuboid, i.e., a third-order tensor, where nucleotides, positions and organisms, each represent a degree of freedom. A tensor mode-1 higher-order singular value decomposition (HOSVD) is formulated such that it separates each cuboid into combinations of patterns of nucleotide frequency variation across organisms and positions, i.e., “eigenpositions” and corresponding nucleotide-specific segments of “eigenorganisms,” respectively, independent of a-priori knowledge of the taxonomic groups or rRNA structures. We find, in support of our hypothesis that, first, the significant eigenpositions reveal multiple similarities and dissimilarities among the taxonomic groups. Second, the corresponding eigenorganisms identify insertions or deletions of nucleotides exclusively conserved within the corresponding groups, that map out entire substructures and are enriched in adenosines, unpaired in the rRNA secondary structure, that participate in tertiary structure interactions. This demonstrates that structural motifs involved in rRNA folding and function are evolutionary degrees of freedom. Third, two previously unknown coexisting subgenic relationships between Microsporidia and Archaea are revealed in both the 16S and 23S rRNA alignments, a convergence and a divergence, conferred by insertions and deletions of these motifs, which cannot be described by a single hierarchy. This shows that mode-1 HOSVD modeling of rRNA alignments might be used to computationally predict evolutionary mechanisms.