Browsing by Subject "Fast Fourier transform"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Algorithm/architecture codesign of low power and high performance linear algebra compute fabrics(2013-08) Pedram, Ardavan; Gerstlauer, Andreas, 1970-; Van de Geijn, Robert A.In the past, we could rely on technology scaling and new micro-architectural techniques to improve the performance of processors. Nowadays, both of these methods are reaching their limits. The primary concern in future architectures with billions of transistors on a chip and limited power budgets is power/energy efficiency. Full-custom design of application-specific cores can yield up to two orders of magnitude better power efficiency over conventional general-purpose cores. However, a tremendous design effort is required in integrating a new accelerator for each new application. In this dissertation, we present the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain operations. The broad vision is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra libraries. We try to address these issues specifically in the context of dense linear algebra applications. In the process, we pursue the main questions that architects will face while designing such accelerators. How broad is this class of applications that the accelerator can support? What are the limiting factors that prevent utilization of these accelerators on the chip? What is the maximum achievable performance/efficiency? Answering these questions requires expertise and careful codesign of the algorithms and the architecture to select the best possible components, datapaths, and data movement patterns resulting in a more efficient hardware-software codesign. In some cases, codesign reduces complexities that are imposed on the algorithm side due to the initial limitations in the architectures. We design a specialized Linear Algebra Processor (LAP) architecture and discuss the details of mapping of matrix-matrix multiplication onto it. We further verify the flexibility of our design for computing a broad class of linear algebra kernels. We conclude that this architecture can perform a broad range of matrix-matrix operations as complex as matrix factorizations, and even Fast Fourier Transforms (FFTs), while maintaining its ASIC level efficiency. We present a power-performance model that compares state-of-the-art CPUs and GPUs with our design. Our power-performance model reveals sources of inefficiencies in CPUs and GPUs. We demonstrate how to overcome such inefficiencies in the process of designing our LAP. As we progress through this dissertation, we introduce modifications of the original matrix-matrix multiplication engine to facilitate the mapping of more complex operations. We observe the resulting performance and efficiencies on the modified engine using our power estimation methodology. When compared to other conventional architectures for linear algebra applications and FFT, our LAP is over an order of magnitude better in terms of power efficiency. Based on our estimations, up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.Item Travel time reliability assessment techniques for large-scale stochastic transportation networks(2010-05) Ng, Man Wo; Waller, S. Travis; Kockelman, Kara M.; Zhang, Zhanmin; Hasenbein, John J.; Morton, David P.Real-life transportation systems are subject to numerous uncertainties in their operation. Researchers have suggested various reliability measures to characterize their network-level performances. One of these measures is given by travel time reliability, defined as the probability that travel times remain below certain (acceptable) levels. Existing reliability assessment (and optimization) techniques tend to be computationally intensive. In this dissertation we develop computationally efficient alternatives. In particular, we make the following three contributions. In the first contribution, we present a novel reliability assessment methodology when the source of uncertainty is given by road capacities. More specifically, we present a method based on the theory of Fourier transforms to numerically approximate the probability density function of the (system-wide) travel time. The proposed methodology takes advantage of the established computational efficiency of the fast Fourier transform. In the second contribution, we relax the common assumption that probability distributions of the sources of uncertainties are known explicitly. In reality, this distribution may be unavailable (or inaccurate) as we may have no (or insufficient) data to calibrate the distributions. We present a new method to assess travel time reliability that is distribution-free in the sense that the methodology only requires that the first N moments (where N is any positive integer) of the travel time to be known and that the travel times reside in a set of known and bounded intervals. Instead of deriving exact probabilities on travel times exceeding certain thresholds via computationally intensive methods, we develop analytical probability inequalities to quickly obtain upper bounds on the desired probability. Because of the computationally intensive nature of (virtually all) existing reliability assessment techniques, the optimization of the reliability of transportation systems has generally been computationally prohibitive. The third and final contribution of this dissertation is the introduction of a new transportation network design model in which the objective is to minimize the unreliability of travel time. The computational requirements are shown to be much lower due to the assessment techniques developed in this dissertation. Moreover, numerical results suggest that it has the potential to form a computationally efficient proxy for current simulation-based network design models.