# Browsing by Subject "Matrix factorization"

Now showing 1 - 4 of 4

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item Algorithm/architecture codesign of low power and high performance linear algebra compute fabrics(2013-08) Pedram, Ardavan; Gerstlauer, Andreas, 1970-; Van de Geijn, Robert A.Show more In the past, we could rely on technology scaling and new micro-architectural techniques to improve the performance of processors. Nowadays, both of these methods are reaching their limits. The primary concern in future architectures with billions of transistors on a chip and limited power budgets is power/energy efficiency. Full-custom design of application-specific cores can yield up to two orders of magnitude better power efficiency over conventional general-purpose cores. However, a tremendous design effort is required in integrating a new accelerator for each new application. In this dissertation, we present the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain operations. The broad vision is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra libraries. We try to address these issues specifically in the context of dense linear algebra applications. In the process, we pursue the main questions that architects will face while designing such accelerators. How broad is this class of applications that the accelerator can support? What are the limiting factors that prevent utilization of these accelerators on the chip? What is the maximum achievable performance/efficiency? Answering these questions requires expertise and careful codesign of the algorithms and the architecture to select the best possible components, datapaths, and data movement patterns resulting in a more efficient hardware-software codesign. In some cases, codesign reduces complexities that are imposed on the algorithm side due to the initial limitations in the architectures. We design a specialized Linear Algebra Processor (LAP) architecture and discuss the details of mapping of matrix-matrix multiplication onto it. We further verify the flexibility of our design for computing a broad class of linear algebra kernels. We conclude that this architecture can perform a broad range of matrix-matrix operations as complex as matrix factorizations, and even Fast Fourier Transforms (FFTs), while maintaining its ASIC level efficiency. We present a power-performance model that compares state-of-the-art CPUs and GPUs with our design. Our power-performance model reveals sources of inefficiencies in CPUs and GPUs. We demonstrate how to overcome such inefficiencies in the process of designing our LAP. As we progress through this dissertation, we introduce modifications of the original matrix-matrix multiplication engine to facilitate the mapping of more complex operations. We observe the resulting performance and efficiencies on the modified engine using our power estimation methodology. When compared to other conventional architectures for linear algebra applications and FFT, our LAP is over an order of magnitude better in terms of power efficiency. Based on our estimations, up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.Show more Item Algorithms for sparse and structurally constrained discrete optimization problems in bioinformatics and communications(2017-05) Barik, Somsubhra; Vikalo, Haris; Veciana, Gustavo De; Ghosh, Joydeep; Soloveichik, David; Press, William HShow more Inference from high-dimensional noisy data, the task encountered in a wide range of applications including those in wireless communications and bioinformatics, is often computationally challenging. Structural constraints that sometimes restrict the space of possible solutions may adversely affect performance of the existing algorithms. Furthermore, when problem variables come from a finite alphabet, such inference problems are often NP-hard. However, one can exploit auxiliary information about the problem's structure to improve accuracy and reduce computational cost of finding a solution to such inference problems. In the first part of this dissertation, we study the setting where an ℓ₀-norm constraint is imposed on the solution to the integer least-square problem. We propose the sparsity-aware sphere decoding algorithm for finding a near-optimal solution to this problem, analyze its computational complexity for commonly used alphabets, and propose its fast variant. These algorithms can be successfully applied to a variety of settings, as demonstrated in the application to sparse channel estimation. In the second part, we focus on discrete valued data clustering problems, with observations that have non-zero representation on subsets of features, leading to sparsity in the object-to-object similarity space. Clustering objects with such structural constraints is computationally hard; however, one can leverage inherent sparsity to reduce the computational burden of sub-optimal heuristics. We study the problems of Single Individual Haplotyping (SIH) and Quasispecies Reconstruction (QSR), which involve determination of genetic variations causing disease susceptibility in humans and evolutionary fitness in RNA viruses, respectively. These problems require reconstruction of finite alphabet parental sequences using a library of sparse random samples of their mixtures. Our contribution here is two-fold: (i) development of a binary-constrained alternating minimization approach for SIH and providing first theoretical guarantee on the error performance, and (iii) design of an end-to-end clustering framework for QSR which estimates the number, composition and proportions of the distinct viral sequences. Experimental results with synthetic and biological datasets from state-of-the-art libraries demonstrate efficacy of the proposed algorithms. Respective software implementations, called HapAltMin and QSdpR, have been made available for use to bioinformatics community.Show more Item Distributed and dynamic factor modeling of online data(2017-08) Teffer, Dean Whitney; Ghosh, Joydeep; Khurshid, Sarfraz; Julien, Christine; Sanghavi, Sujay; Chakrabarti, DeepayanShow more The domain of data mining and machine learning has expanded rapidly in recent years to include both large-scale distributed and streaming computation. Although many open-source and cloud-based frameworks are available for these tasks, many of which are used in-production by industry, this is a rapidly-evolving technology landscape, and the gap between the academic role of algorithm development and discovery and code available for use with real-world data has grown. In addition, although there is a rich history of mathematical models for streaming data on continuous vector spaces, there has been significantly less work on streaming discrete spaces. However, much if not most of the data available online is composed of high-dimensional sparse counts, such as text corpora and interaction networks. We attempt to help bridge this gap by extending promising Bayesian Poisson factorization and co-factorization models that can be used, for example, to model not only text corpora but also related user interactions in a social network. We construct a dependent process prior that enables dynamic latent factor modeling in the natural probability space of the factors, rather than in the raw data. These models are then scaled to and implemented for distributed compute systems and streaming data. We develop an adaptive hashing method (AdaHash) for lambda architectures that can use latent factors calculated during periodic batch mode updates as a similarity metric for hierarchical grouping, or for finding similar factors to reconcile parameters in a distributed compute scenario. In addition, we develop a novel Hidden Markov variant using particle filters to update prior factors and probabilistically group with new factors in a dynamic inference model (D-GaPS). We show experimentally that the distributed model converges to similar factors as single-process inference, and the dynamic model yields superior quality topics over batch mode alternatives. Empirical studies are presented on the use of a U.S. Senate voting and bill summary data set that is readily interpretable with regard to latent factors.Show more Item Scalable algorithms for latent variable models in machine learning(2016-08) Yu, Hsiang-Fu; Dhillon, Inderjit S.; Pingali, Keshav; Qiu, Lili; Vishwananthan, S.V.N.; Lin, Chih-JenShow more Latent variable modeling (LVM) is a popular approach in many machine learning applications, such as recommender systems and topic modeling, due to its ability to succinctly represent data, even in the presence of several missing entries. Existing learning methods for LVMs, while attractive, are infeasible for the large-scale datasets required in modern big data applications. In addition, such applications often come with various types of side information such as the text description of items and the social network among users in a recommender system. In this thesis, we present scalable learning algorithms for a wide range of latent variable models such as low-rank matrix factorization and latent Dirichlet allocation. We also develop simple but effective techniques to extend existing LVMs to exploit various types of side information and make better predictions in many machine learning applications such as recommender systems, multi-label learning, and high-dimensional time-series prediction. In addition, we also propose a novel approach for the maximum inner product search problem to accelerate the prediction phase of many latent variable models.Show more