Tensor generalizations of the singular value decomposition for integrative analysis of large-scale molecular biological data
Abstract
The structure of large-scale molecular biological data is often of an order higher than
that of a matrix, especially when integrating data from different studies. Flattened
into a matrix format, much of the information in the data is lost. I describe the
use of higher-order generalizations of singular value decomposition (SVD) - both
the higher-order singular value decomposition (HOSVD) and Parallel Factorization
(PARAFAC) - in transforming tensors into simplified spaces. I apply these transformations
to a series of DNA microarray datasets from different studies tabulated
in a tensor of genes × time × conditions, specifically an integration of genome-scale
mRNA expression data from three yeast-cell cycle time courses. One of the time
courses was under exposure to the oxidative stress agent hydrogen peroxide (HP);
another was exposed to menadione (MD) and the third was unstressed[45].
The HOSVD transforms the tensor to a “core tensor” of “eigenarrays” × “timeeigengenes”
× “condition-eigengenes,” where the eigenarrays, time-eigengenes and
condition-eigengenes are unique orthonormal superpositions of the genes, times and
conditions, respectively. This HOSVD, also known as N-mode SVD, formulates the
tensor as a linear superposition of all possible outer products of an eigenarray, a timeeigengene
and a condition-eigengene, i.e., rank-1 “subtensors,” the superposition
coefficients of which are tabulated in the core tensor. Each coefficient indicates the
significance of the corresponding subtensor in terms of the overall information it
captures in the data. PARAFAC reformulates the same data tensor into a sum of
rank-1 tensor of F elements that best approximate the data tensor in a least square
sense.
I show that significant rank-1 subtensors can be associated with independent biological
processes, which are manifested in the data tensor. Subtensors of the HOSVD
capture the subprocesses: stress response, pheromone response and developmental
stage. The data suggests that the conserved genes YKU70, MRE11, AIF1 and
ZWF1, as well as the genes involved in the processes of retrotransposition, apoptosis
and the oxidative pentose phosphate cycle may play significant, yet previously unrecognized,
roles in the differential effects of HP and MD on cell cycle progression.
Subtensors of PARAFAC capture the same biological processes as the 2 most significant
HOSVD subtensors. A genome-wide correlation between DNA replication
and initiation of RNA transcription, which is equivalent to a recently discovered
correlation and might be due to a previously unknown mechanism of regulation, is
independently uncovered.