Network-on-chip architectures for scalability and service guarantees
Rapidly increasing transistor densities have led to the emergence of richly-integrated substrates in the form of chip multiprocessors and systems-on-a-chip. These devices integrate a variety of discrete resources, such as processing cores and cache memories, on a single die with the degree of integration growing in accordance with Moore's law. In this dissertation, we address challenges of scalability and quality-of-service (QOS) in network architectures of highly-integrated chips. The proposed techniques address the principal sources of inefficiency in networks-on-chip (NOCs) in the form of performance, area, and energy overheads. We also present a comprehensive network architecture capable of interconnecting over a thousand discrete resources with high efficiency and strong guarantees. We first show that mesh networks, commonly employed in existing chips, fall significantly short of achieving their performance potential due to transient congestion effects that diminish network performance. Adaptive routing has the potential to improve performance through better load distribution. However, we find that existing approaches are myopic in that they only consider local congestion indicators and fail to take global network state into account. Our approach, called Regional Congestion Awareness (RCA), improves network visibility in adaptive routers via a light-weight mechanism for propagating and integrating congestion information. By leveraging both local and non-local congestion indicators, RCA improves network load balance and boosts throughput. Under a set of parallel workloads running on a 49-node substrate, RCA reduces on-chip network latency by 16%, on average, compared to a locally-adaptive router. Next, we target NOC latency and energy efficiency through a novel point-to-multipoint topology. Ring and mesh networks, favored in existing on-chip interconnects, often require packets to go through a number of intermediate routers between source and destination nodes, resulting in significant latency and energy overheads. Topologies that improve connectivity, such as fat tree and flattened butterfly, eliminate much of the router overhead, but require non-minimal channel lengths or large channel count, reducing energy-efficiency and/or performance as a result. We propose a new topology, called Multidrop Express Channels (MECS), that augments minimally-routed express channels with multi-drop capability. The resulting richly-connected NOC enjoys a low hop count with favorable delay and energy characteristics, while improving wire utilization over prior proposals. Applications such as virtualized servers-on-a-chip and real-time systems require chip-level quality-of-service (QOS) support to provide fairness, service differentiation, and guarantees. Existing network QOS approaches suffer from considerable performance and area overheads that limit their usefulness in a resource-limited on-die network. In this dissertation, we propose a new QOS scheme called Preemptive Virtual Clock (PVC). PVC uses a preemptive approach to provide hard guarantees and strong performance isolation while dramatically reducing queuing requirements that burden prior proposals. Finally, we introduce a comprehensive network architecture that overcomes the bottlenecks of earlier designs with respect to area, energy, and QOS in future highly-integrated chips. The proposed NOC uses a topology-centric QOS approach that restricts the extent of hardware QOS support to a fraction of the network without compromising guarantees. In doing so, network area and energy efficiency are significantly improved. Further improvements are derived through a novel flow-control mechanism, along with switch- and link-level optimizations. In concert, these techniques yield a network capable of interconnecting over a thousand terminals on a die while consuming 47% less area and 26% less power than a state-of-the-art QOS-enabled NOC. The mechanisms proposed in this dissertation are synergistic and enable efficient, high-performance interconnects for future chips integrating hundreds or thousands of on-die resources. They address deficiencies in routing, topologies, and flow control of existing architectures with respect to area, energy, and performance scalability. They also serve as a building block for cost-effective advanced services, such as QOS guarantees at the die level.