Guardband management in heterogeneous architectures
MetadataShow full item record
Performance and power efficiency are two of the most critical aspects of computing systems. Moore's law (the doubling of transistors in a chip every 18 months), coupled with Dennard scaling, enabled a synergy between device, circuit, microarchitecture, and architecture to drive improvements in those two critical aspects. With the recent end of Dennard scaling, on-chip transistor count continues to increase, but the smaller transistor size no longer provides performance per power gain. The divergence between transistor density increases and power efficiency gain decreases results in processor design paradigm shifts from the single-core CPU architecture to the multicore or manycore CPU architecture, and eventually to the heterogeneous architecture. Besides performance and power efficiency, reliability is another crucial computing requirement. However, regardless of how the architecture evolves, processors still need to trade off a significant portion of performance or power efficiency to ensure reliability. When running on the silicon, processors experience continuously varying operating conditions, such as process, voltage, and temperature (PVT) variation. All the variation may slow down circuit speed and cause timing errors. The traditional approach to ensuring the reliable operation in the presence of possible worst-case conditions is to statically assign a large-enough voltage margin (or guardband). But such an approach leads to wasted energy, because the worst-case condition rarely occurs, and the processor could have operated at a lower voltage most of the time [36, 48, 77]. We need to actively manage the voltage guardband to fully unlock the efficiency potential of heterogeneous architectures. However, guardband management in heterogeneous architectures is a particularly challenging problem that has not been studied by prior work. On one hand, as transistors become smaller, the impact of PVT variation relative to the nominal voltage becomes more significant . On the other hand, increasing core count in the processor results in a larger die area and a higher peak power consumption, both of which complicate and enlarge the impact of PVT variation. To this end, this thesis studies cross-layer mechanisms that span from the circuit to (micro)architecture to software runtime for managing the guardband in the heterogeneous architecture. Most prior works have studied guardband management mechanisms only in the circuit or (micro)architecture level. In comparison, my colleagues and I studied cross-layer mechanisms that require lower hardware design complexity and incur less implementation overhead because the software takes a major role in guardband management. Moreover, the cross-layer mechanisms alleviate the need for (micro)architecture-specific optimizations, which make them scalable solutions in the current era of rapidly evolving heterogeneous architectures. This thesis performs such a study in the manycore GPU architecture, which is a representative heterogeneous architecture and has been widely adopted in mainstream computing. The first part of the thesis focuses on the modeling and characterization of PVT variation in the GPU architecture. We first perform a thorough characterization of the underlying PVT variation's impact on the voltage guardband based on hardware measurements. After identifying voltage variation (noise) as the most challenging and necessary factor for guardband management, we study methodologies for how to accurately model voltage noise in the manycore architecture. The insights on how the circuit, microarchitecture, and program interact with each other to affect the PVT variation lay foundations for cross-layer guardband management mechanisms studied in this thesis. The second part of this thesis studies two guardband-management techniques and demonstrates that they can significantly improve the GPU architecture's energy efficiency. We first study how to improve the worst-case guardbanding design by performing voltage smoothing, which effectively mitigates large voltage noise and achieves significant energy savings with less guardband requirement. We then study how to adapt to the program-specific guardband requirement to fully unlock the current GPU's efficiency potential. We propose a mechanism called predictive guardbanding, in which the program directly predicts its voltage requirement. The proposed design leverages cross-layer optimization to minimize hardware complexity and overhead. The last part of this thesis studies reliability optimization when the prediction in the predictive guardbanding fails with an unexpected error margin. We advocate maintaining system-level reliability, and we propose a design paradigm called asymmetric resilience, whose principle is to develop the reliable heterogeneous CPU-GPU system centering around the CPU. This generic design paradigm eases the GPU away from reliability optimization. We present design principles and practices for the heterogeneous system that adopts such design paradigm. Following the principles of asymmetric resilience, we demonstrate how to use the CPU architecture to handle GPU execution errors, which lets the GPU focus on typical case operation for better energy efficiency. We explore the design space and demonstrate that it can be used as the safety-net mechanism in predictive guardbanding with reasonable overhead.