Control flow speculation for distributed architectures
MetadataShow full item record
As transistor counts, power dissipation, and wire delays increase, the microprocessor industry is transitioning from chips containing large monolithic processors to multi-core architectures. The granularity of cores determines the mechanisms for branch prediction, instruction fetch and map, data supply, instruction execution, and completion. Accurate control flow prediction is essential for high performance processors with large instruction windows and high-bandwidth execution. This dissertation considers cores with very large granularity, such as TRIPS, as well as cores with extremely small granularity, such as TFlex, and explores control flow speculation issues in such processors. Both TRIPS and TFlex are distributed block-based architectures and require control speculation mechanisms that can work in a distributed environment while supporting efficient block-level prediction, misprediction detection, and recovery. This dissertation aims at providing efficient control flow prediction techniques for distributed block-based processors. First, we discuss simple exit predictors inspired by branch predictors and describe the design of the TRIPS prototype block predictor. Area and timing trade-offs in the predictor implementation are presented. We report the predictor misprediction rates from the prototype chip for the SPEC benchmark suite. Next, we look at the performance bottlenecks in the prototype predictor and present a detailed analysis of exit and target predictors using basic prediction components inspired from branch predictors. This study helps in understanding what types of predictors are effective for exit and target prediction. Using the results of our prediction analysis, we propose novel hardware techniques to improve the accuracy of block prediction. To understand whether exit prediction is inherently more difficult than branch prediction, we measure the correlation among branches in basic blocks and hyperblocks and examine the loss in correlation due to hyperblock construction. Finally, we propose block predictors for TFlex, a fully distributed architecture that uses composable lightweight processors. We describe various possible designs for distributed block predictors and a classification scheme for such predictors. We present results for predictors from each of the design points for distributed prediction.