Instruction Scheduling for EDGE Architectures
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Instruction scheduling is a code reordering transformation used to hide latencies and expose parallelism in modern day microprocessors. Scheduling is often critical in achieving peak performance on these processors. The instruction scheduler is responsible for dispatching instructions to functional units based on dependencies, latency, and resource availability. The scheduler improves performance by reordering instructions to expose parallelism, reducing pipeline hazards, and firing high latency instructions early. The focus of this research is to develop instruction scheduling algorithms for emerging EDGE architectures like TRIPS (Tera-op Reliable Intelligently adaptive Processing System). These processors have heavily partitioned execution substrates with many ALUs to deal with limits in pipeline depth, clock rate growth, and on-chip wire speeds. Instruction scheduling for these processors is difficult because the instruction execution window is large and resources are distributed across the grid with non-zero communication latencies. However, a large instruction execution window offers many optimization opportunities to the scheduler. The architecture exposes wire-delays to the scheduler so that it can reduce routing distances between dependent instructions and network contention. The scheduler must efficiently utilize processor’s computation (ALUs) and communication (network links, memory bandwidth, etc.) resources to achieve high performance. These challenges make instruction scheduling for grid processors a harder problem and more important to performance than for conventional machines. The main contribution of this work is a number of instruction scheduling algorithms for the TRIPS processor. These algorithms range from a list scheduling approach to instruction pre-placement to physical path scheduling that models distances between resources and their use by the program. We also develop an evaluation model to measure the effectiveness of these algorithms. Instruction scheduling algorithms based on list scheduling have been the dominant technique used for scheduling problems. This work explores the strengths and weaknesses of the list scheduling algorithm (LSA). We propose a new framework, called distance sensitive framework (DSF), to effectively address the deficiencies of the LSA. The DSF adapts scheduling policies based on program characteristics. We examine DSF performance across various benchmarks and compare its performance against an LSA implementation. We evaluate our framework based on benchmark programs and a set of real applications. These programs include both highly serial and highly parallel code to test scheduler performance across a broad range of applications. We run our experiments on a cycle accurate architectural simulator and compare instructions per cycle (IPC) for LSA and DSF implementations. The DSF implementation on the TRIPS processor simulator obtains an average performance improvement of 12% over LSA