Matrix and tensor decomposition methods as tools to understanding sequence-structure relationships in sequence alignments
We describe the use of a tensor mode-1 higher-order singular value decomposition (HOSVD) in the analyses of alignments of 16S and 23S ribosomal RNA (rRNA) sequences, each encoded in a cuboid of frequencies of nucleotides across positions and organisms. This mode-1 HOSVD separates the data cuboids into combinations of patterns of nucleotide frequency variation across the positions and organisms, i.e., "eigenorganisms"' and corresponding nucleotide-specific segments of "eigenpositions," respectively, independent of a-priori knowledge of the taxonomic groups and their relationships, or the rRNA structures. We show that this mode-1 HOSVD provides a mathematical framework for modeling the sequence alignments where the mathematical variables, i.e., the significant eigenpositions and eigenorganisms, are consistent with current biological understanding of the 16S and 23S rRNAs. First, the significant eigenpositions identify multiple relations of similarity and dissimilarity among the taxonomic groups, some known and some previously unknown. Second, the corresponding eigenorganisms identify positions of nucleotides exclusively conserved within the corresponding taxonomic groups, but not among them, that map out entire substructures inserted or deleted within one taxonomic group relative to another. These positions are also enriched in adenosines that are unpaired in the rRNA secondary structure, the majority of which participate in tertiary structure interactions, and some also map to the same substructures. This demonstrates that an organism's evolutionary pathway is correlated and possibly also causally coordinated with insertions or deletions of entire rRNA substructures and unpaired adenosines, i.e., structural motifs which are involved in rRNA folding and function. Third, this mode-1 HOSVD reveals two previously unknown subgenic relationships of convergence and divergence between the Archaea and Microsporidia, that might correspond to two evolutionary pathways, in both the 16S and 23S rRNA alignments. This demonstrates that even on the level of a single rRNA molecule, an organism's evolutionary pathway is composed of different types of changes in structure in reaction to multiple concurrent evolutionary forces.