Improving secondary structure prediction with covariation analysis and structure-based alignment system of RNA sequences

Shang, Lei, active 2013

Improving secondary structure prediction with covariation analysis and structure-based alignment system of RNA sequences

Access full-text files

SHANG-DISSERTATION-2013.pdf (4.79 MB)

Date

2013-12

Authors

Shang, Lei, active 2013

Abstract

RNA molecules form complex higher-order structures which are essential to perform their biological activities. The accurate prediction of an RNA secondary structure and other higher-order structural constraints will significantly enhance the understanding of RNA molecules and help interpret their functions. Covariation analysis is the predominant computational method to accurately predict the base pairs in the secondary structure of RNAs. I developed a novel and powerful covariation method, Phylogenetic Events Count (PEC) method, to determine the positional covariation. The application of the PEC method onto a bacterial 16S rRNA sequence alignment proves that it is more sensitive and accurate than other mutual information based method in the identification of base-pairs and other structural constraints of the RNA structure. The analysis also discoveries a new type of structural constraint – neighbor effect, between sets of nucleotides that are in proximity in the three dimensional RNA structure with weaker but significant covariation with one another. Utilizing these covariation methods, a proposed secondary structure model of an entire HIV-1 genome RNA is evaluated. The results reveal that vast majority of the predicted base pairs in the proposed HIV-1 secondary structure model do not have covariation, thus lack the support from comparative analysis.

Generating the most accurate multiple sequence alignment is fundamental and essential of performing high-quality comparative analysis. The rapid determination of nucleic acid sequences dramatically increases the number of available sequences. Thus developing the accurate and rapid alignment program for these RNA sequences has become a vital and challenging task to decipher the maximum amount of information from the data. A template-based RNA sequence alignment system, CRWAlign-2, is developed to accurately align new sequences to an existing reference sequence alignment based on primary and secondary structural similarity. A comparison of CRWAlign-2 with eight alternative widely-used alignment programs reveals that CRWAlign-2 outperforms other programs in aligning new sequences with higher accuracy. In addition to aligning sequences accurately, CRWAlign-2 also creates secondary structure models for each sequence to be aligned, which provides very useful information for the comparative analysis of RNA sequences and structures. The CRWAlign-2 program also provides opportunities for multiple areas including the identification of chimeric 16S rRNA sequences generated in microbiome sequencing projects.