Browsing by Subject "Tensor factorization"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Computational methods for understanding genetic variations from next generation sequencing data(2018-05) Ahn, Soyeon, Ph. D.; Vikalo, Haris; de Veciana, Gustavo; Vishwanath, Sriram; Soloveichik, David; Savran, CagriStudies of human genetic variation reveal critical information about genetic and complex diseases such as cancer, diabetes and heart disease, ultimately leading towards improvements in health and quality of life. Moreover, understanding genetic variations in viral population is of utmost importance to virologists and helps in search for vaccines. Next-generation sequencing technology is capable of acquiring massive amounts of data that can provide insight into the structure of diverse sets of genomic sequences. However, reconstructing heterogeneous sequences is computationally challenging due to the large dimension of the problem and limitations of the sequencing technology.This dissertation is focused on algorithms and analysis for two problems in which we seek to characterize genetic variations: (1) haplotype reconstruction for a single individual, so-called single individual haplotyping (SIH) or haplotype assembly problem, and (2) reconstruction of viral population, the so-called quasispecies reconstruction (QSR) problem. For the SIH problem, we have developed a method that relies on a probabilistic model of the data and employs the sequential Monte Carlo (SMC) algorithm to jointly determine type of variation (i.e., perform genotype calling) and assemble haplotypes. For the QSR problem, we have developed two algorithms. The first algorithm combines agglomerative hierarchical clustering and Bayesian inference to reconstruct quasispecies characterized by low diversity. The second algorithm utilizes tensor factorization framework with successive data removal to reconstruct quasispecies characterized by highly uneven frequencies of its components. Both algorithms outperform existing methods in both benchmarking tests and real data.Item Large scale matrix factorization with guarantees: sampling and bi-linearity(2015-12) Bhojanapalli, Venkata Sesha Pavana Srinadh; Sanghavi, Sujay Rajendra, 1979-; Caramanis, Constantine; Dhillon, Inderjit; Dimakis, Alexandros; Ravikumar, Pradeep; Ward, RachelLow rank matrix factorization is an important step in many high dimensional machine learning algorithms. Traditional algorithms for factorization do not scale well with the growing data sizes and there is a need for faster/scalable algorithms. In this dissertation we explore the following two major themes to design scalable factorization algorithms for the problems: matrix completion, low rank approximation (PCA) and semi-definite optimization. (a) Sampling: We develop the optimal way to sample entries of any matrix while preserving its spectral properties. Using this sparse sketch (set of sampled entries) instead of the entire matrix, gives rise to scalable algorithms with good approximation guarantees. (b) Bi-linear factorization structure: We design algorithms that operate explicitly on the factor space instead on the matrix. While bi-linear structure of the factorization, in general, leads to a non-convex optimization problem, we show that under appropriate conditions they indeed recover the solution for the above problems. Both these techniques (individually or in combination) lead to algorithms with lower computational complexity and memory usage. Finally we extend these ideas of sampling and explicit factorization to design algorithms for higher order tensors.Item Learning and validating clinically meaningful phenotypes from electronic health data(2018-08-06) Henderson, Jessica Lowell; Ghosh, Joydeep; Press, William H.; Mueller, Peter; van de Geijn, Robert; Paydarfar, DavidThe ever-growing adoption of electronic health records (EHR) to record patients' health journeys has resulted in vast amounts of heterogeneous, complex, and unwieldy information [Hripcsak and Albers, 2013]. Distilling this raw data into clinical insights presents great opportunities and challenges for the research and medical communities. One approach to this distillation is called computational phenotyping. Computational phenotyping is the process of extracting clinically relevant and interesting characteristics from a set of clinical documentation, such as that which is recorded in electronic health records (EHRs). Clinicians can use computational phenotyping, which can be viewed as a form of dimensionality reduction where a set of phenotypes form a latent space, to reason about populations, identify patients for randomized case-control studies, and extrapolate patient disease trajectories. In recent years, high-throughput computational approaches have made strides in extracting potentially clinically interesting phenotypes from data contained in EHR systems. Tensor factorization methods have shown particular promise in deriving phenotypes. However, phenotyping methods via tensor factorization have the following weaknesses: 1) the extracted phenotypes can lack diversity, which makes them more difficult for clinicians to reason about and utilize in practice, 2) many of the tensor factorization methods are unsupervised and do not utilize side information that may be available about the population or about the relationships between the clinical characteristics in the data (e.g., diagnoses and medications), and 3) validating the clinical relevance of the extracted phenotypes requires domain training and expertise. This dissertation addresses all three of these limitations. First, we present tensor factorization methods that discover sparse and concise phenotypes in unsupervised, supervised, and semi-supervised settings. Second, via two tools we built, we show how to leverage domain expertise in the form of publicly available medical articles to evaluate the clinical validity of the discovered phenotypes. Third, we combine tensor factorization and the phenotype validation tools to guide the discovery process to more clinically relevant phenotypes.Item Limited feedback scheme using tensor decompositions for FDD massive MIMO systems(2020-05-14) Joe, Kevin Jinho; Andrews, Jeffrey G.We propose a novel limited feedback scheme for massive multiple-input multiple-output (MIMO) systems in frequency-division duplexing (FDD) wideband system. We assume that the user (UE) has knowledge of a downlink (DL) channel estimate. In order for massive MIMO systems to achieve high capacity, the base station (BS) must have the DL channel state information. Traditional feedback methods cannot work because channels for massive MIMO systems are usually too large to feedback within the coherence time. Our goal is to feedback the DL channel estimate from the UE back to the BS with as little information as possible. Our method uses two different tensor decompositions, the canonical polyadic decomposition (CPD) and the rank-(L [subscript r], L [subscript r], 1) or LL-1 block decomposition, on the DL frequency channel to estimate its parameters. By feeding back only the channel parameters, we show through simulations that our method is able to efficiently and accurately reconstruct the DL channel.