Power-aware processor system design
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With everyday advances in technology and low-cost economics, processor systems are moving towards split grid shared power delivery networks (PDNs) while providing increased functionality and higher performance capabilities resulting in increased power consumption. Split grid refers to dividing up the power grid resources among various homogeneous and heterogeneous functional modules and processors. When the PDN is shared and common across multiple processors and function blocks, it is called a Shared PDN. In order to keep the power in control on a split-grid shared PDN, the processor system is required to operate when various hardware modules interact with each other while the supply voltage (V [subscript DD]) and clock frequency (F [subscript CLK]) are scaled. Software or hardware assisted power-collapse and low-power retention modes can be automatically engaged in the processor system. The processor system should also operate at maximum performance under power constraints while consuming the full thermal design power (TDP). The processor system should neither violate board and card current limits nor the power management integrated circuit (PMIC) limits or its slew rate requirements for current draw on the shared PDN. It is expected to operate within thermal limits below an operating temperature. The processor system is also required to detect and mitigate current violations within microseconds and temperature violations in milliseconds. The processor system is expected to be robust and should be able to tolerate voltage droops. Its importance is highlighted with the processor system being on shared PDN. Because of the sharing of the PDN, the voltage droop mitigation scheme is expected to be quick and must suppress V [subscript DD] droop propagation at the source while only introducing negligible performance penalties during this mitigation. Without a solution for V [subscript DD] droop in place, the entire V [subscript DD] of shared PDN is forced to be at a higher voltage, increasing overall system power. This can potentially affect the days of use (DoU) of battery-operated systems, and reliability and cooling of wired systems. A multi-threaded processor system is expected to monitor the current, power and voltage violations and react quickly without affecting the performance of its hardware threads while maintaining quality of service (QoS). Early high-level power estimates are a necessity to project how much power will be consumed by a future processor system. These power projections are used to plan for software use cases and to reassign power-domains of processors and function blocks belonging to the shared PDN. Additionally, it helps to re-design boards and power-cards, re-implement the PDN, change PMIC and plan for additional power, current, voltage and temperature violation related mitigation schemes if the existing solutions are insufficient. The split grid shared PDN that is implemented in a system-on-chip (SoC) is driven by low cost electronics and forces multiple voltage rails for a better energy efficiency. To support this, there is a need for incorporation of voltage levels and power-states into a processor behavioral register transfer level (RTL) model. Low power verification is a must in a split-grid PDN. To facilitate these, the RTL is annotated with voltage supplies and isolation circuits that engage and protect during power collapse scenarios across various voltage domains. The power-aware RTL design is verified, identified and corrected for low power circuit and RTL bugs prior to tape-out. The mandatory features to limit current, power, voltage and temperatures in these high performance and power hungry processor systems introduce a need to provide high level power projections for a processor system accounting for various split-grid PDN supplying V [subscript DD] to the processor, the interface bus, various function blocks, and co-processors. To solve this problem, a power prediction solution is provided that has an average-power error of 8% in prediction and works with reasonable accuracy by tracking instantaneous power for unknown software application traces. The compute time to calculate power using the generated prediction model is 100000X faster and uses 100X less compute memory compared to a commercial electronic design automation (EDA) RTL power tool. This solution is also applied to generate a digital power meter (DPM) in hardware for real-time power estimates while the processor is operational. These high-level power estimates project the potential peak-currents in these processor systems. This resulted in a need for new tests to be created and validated on silicon in order to functionally stress the split-grid shared PDN for extreme voltage droop and sustained high current usage scenarios. For this reason, functional test sequences are created for high power and voltage stress testing of multi-threaded processors. The PDN is a complex system and needs different functional test sequences to generate various kinds of high and low power instruction packets that can stress it. These voltage droop stress tests affect V [subscript MIN] margins in various voltage and frequency modes of operation in a commercial multi-threaded processor. These results underscore a need for voltage mitigation solutions. The processor system operating on a split grid shared PDN can have its V [subscript MIN] increased due to voltage stress tests or a power-virus software application. The shared PDN imposes requirements to mitigate the voltage noise at the source and avoid any possibility of increases to the shared PDN V [subscript DD]. This necessitates implementing a proactive system that can mitigate voltage droop before it occurs while lowering the processor’s minimum voltage of operation (V [subscript MIN]) to help in system power reduction. To mitigate the voltage droops, a proactive clock gating system (PCGS) is implemented with a voltage clock gate (VCG) circuit that uses a digital power meter (DPM) and a model of a PDN to predict the voltage droop before its occurrence. Silicon results show PCGS achieves 10% higher clock frequency (F [subscript CLK]) and 5% lower supply voltage (V [subscript DD]) in a 7nm processor. Questions arise about the effectiveness of PCGS over a reactive voltage droop mitigation scheme in the context of a shared PDN. This results in analysis of PCGS and its comparison against a reactive voltage droop mitigation scheme. This work shows the importance of voltage droop mitigation reaction time for a split grid shared PDN and highlights benefits of PCGS in its ability to provide better V [subscript MIN] of the entire split grid shared PDN. The silicon results from power-stress tests shows the possibility of the high-power processor system exceeding board or power-supply card current capacity and thermal violations. This requires designing a limiting system that can adapt processor performance. This limiting system is expected to meet the stringent system latency of 1 µs for sustained peak-current violations and react in the order of milli-seconds for thermal mitigation. It is also expected of this system to maintain the desired Quality of Service (QoS) of the multi-threaded processor. This results in implementation of a current and temperature limiting response circuit in a 7nm commercial processor. The randomized pulse modulation (RPM) circuit adapts processor performance and reduces current violations in the system within 1 µs and maintains thread fairness with a 0.4% performance resolution across a wide range of operation from 100% to 0.4%. Hard requirements from SoC software and hardware require the processor systems to be within the TDP and power budgets and processors sharing the split gird PDN. Power consumed by the threads (processors) are now exceeded by added functionality of new threads (processors), which could consume much higher power compared to power of previous generation processors. The threads (processors) operate cohesively in a multi-threaded processor system and though there is a large difference in magnitude of power profiles across threads (processors), the overall performance of the multi-threaded processor is not expected to be compromised. This enforces a need for a power limiting system that can specifically slow down the high-power threads (processors) to meet power-budgets and not affect performance of low-power threads. For this reason, a thread specific multi-thread power limiting (MTPL) mechanism is designed that monitors the processor power consumption using the per thread DPM (PTDPM). Implemented in 7nm for a commercial processor, silicon results demonstrate that the thread specific MTPL does not affect the performance of low power threads during power limiting until the current (power) is limited to very low values. For high power threads and during higher current (power) limiting scenarios, the thread specific MTPL shows similar performance to a conventional global limiting mechanism. Thus, the thread specific MTPL enables the multi-threaded processor system to operate at a higher overall performance compared to a conventional global mechanism across most of the power budget range. For the same power budget, the processor performance can be up to 25% higher using the thread specific MPTL compared to using a global power limiting scheme. In summary, in this dissertation design for power concepts are presented for a processor system on a split-grid shared PDN through various solutions that address challenges in high-power processors and help alleviate potential problems. These solutions range from embedding power-intent, to incorporating voltage droop prediction intelligence through power usage estimation, maintaining quality of service within a stringent system latency, to slowing down specific high-power threads of a multi-threaded processor. All these methods can work cohesively to incorporate power-awareness in the processor systems, making the processors energy efficient and operate reliably within the TDP.