A technology-scalable composable architecture
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clock rate scaling can no longer sustain computer system performance scaling due to power and thermal constraints and diminishing performance returns of deep pipelining. Future performance improvements must therefore come from mining concurrency from applications. However, increasing global on-chip wire delays will limit the amount of state available in a single cycle, thereby hampering the ability to mine concurrency with conventional approaches. To address these technology challenges, the processor industry has migrated to chip multiprocessors (CMPs). The disadvantage of conventional CMP architectures, however, is their relative inflexibility to meet the wide range of application demands and operating targets that now exist. The granularity (e.g., issue width), the number of processors in a chip and memory hierarchies are fixed at design time based on the target workload mix, which result in suboptimal operation as the workload mix and operating targets change over time. In this dissertation, we explore the concept of composability to address both the increasing wire delay problem and the inflexibility of conventional CMP architectures. The basic concept of composability is the ability to dynamically adapt to diverse applications and operating targets, both in terms of granularity and functionality, by aggregating finegrained processing units or memory units. First, we propose a composable on-chip memory substrate, called Non-Uniform Access Cache Architecture (NUCA) to address increasing on-chip wire delay for future large caches. The NUCA substrate breaks large on-chip memories into many fine-grained memory banks that are independently accessible, with a switched network embedded in the cache. Lines can be mapped into this array of memory banks with fixed mappings or dynamic mappings, where cache lines can move around within the cache to further reduce the average cache hit latency. Second, we evaluate a range of strategies to build a composable processor. Composable processors provide flexibility of adapting the granularity of processors to various application demands and operating targets, and thus choose the hardware configurations best suited to any given point. A composable processor consists of a large number of lowpower, fine-grained processor cores that can be aggregated dynamically to form more powerful logical processors. We present architectural innovations to support composability in a power- and area-efficient manner.
Department
Description
text