Network-on-chip implementation and performance improvement through workload characterization and congestion awareness

Gratz, Paul V., 1970-

Network-on-chip implementation and performance improvement through workload characterization and congestion awareness

Access full-text files

gratzp71082.pdf (4.1 MB)

Date

2008-12

Authors

Gratz, Paul V., 1970-

Abstract

Off-chip interconnection networks provide for communication between processors and components within computer systems. Semiconductor process technology trends have led to the inclusion of multiple processors and components onto a single chip and recently research has focused on interconnection networks, on-chip, to connect them together. On-chip networks provide a scalable, high-bandwidth interconnect, integrated tightly with the microarchitecture to achieve high performance. On-chip networks present several new challenges, different from off-chip networks, including tighter constraints in power, area and end-to-end latency. In this dissertation, I propose interconnection network architectures that address the unique design challenges of power and end-to-end latency on chip. My work in the design, implementation and evaluation of the on-chip networks of the TRIPS project’s prototype processor, a real hardware implementation, is the foundation for my work in on-chip networking. Based on my analysis of the TRIPS on-chip networks and their workloads, I propose, design, and evaluate novel network architectures for congestion monitoring and adaptive routing that are matched to the design constraints of on-chip networks. In the TRIPS system we designed, and implemented in silicon, a distributed processor microarchitecture where traditional processor components are divided into a collection of self-contained tiles. One novel aspect of the TRIPS system is the control and data networks that the tiles use to communicate with one another. I worked on the design and implementation of one of these networks, the On-Chip Network (OCN). The OCN, a 4x10 mesh network, interconnects the tiles of the L2 cache, the two processor cores and various I/O units. Another on-chip network, the Operand Network (OPN), interconnects the execution units and serves as a bypass network, integrated tightly with the processor core. In this document I evaluate these two on-chip networks and their workloads, these evaluations serve as case studies in how on-chip design constraints affect the design of on-chip networks. In the examination of the TRIPS OCN and OPN networks, one insight we gained was that network resource imbalances can lead to congestion and poor performance. We found these imbalances are transient with time and task. Timely information about the status of the network can be used to balance the resource utilization, or reduce power. A challenge lies in providing the right information, conveyed in a timely fashion, as the metrics and methods used in off-chip networks do not map well to on-chip networks. In this document, I propose and evaluate several metrics of network congestion for their utility and feasibility in an on-chip environment. In our examination of the TRIPS on-chip networks we also found that minimizing end-to-end packet latency was critical to maintaining good system performance. Effective use of the congestion information without impact to end-to-end latency is another challenge in on-chip networking. I explore novel adaptive routing techniques that address the challenge of managing the end-to-end latency. A method that produces good results is aggregation of network status information, reducing both the bandwidth and latency required for status information transmission. In this dissertation I examine how well this technique and others compare with conventional oblivious and adaptive routing.