Probabilistic design for emerging memory and nanometer-scale logic
MetadataShow full item record
As semiconductor technology has scaled down, the impact of stochastic behavior in very large scale integrated circuits (VLSI) has become an ever-more important concern. This dissertation investigates two distinct classes of problems that require the use of probabilistic methods and models: (1) Modeling and exploiting stochastic behavior in advanced memory technologies; (2) Probabilistic modeling of faults due to on-chip voltage variation. This dissertation first investigates the unique physics-level stochasticity of spin-transfer torque magnetic RAM (STT-RAM). The write process of STT-RAM is stochastic: specifically, the write time of a bitcell varies significantly. The wors-tcase approach, which uses the longest write pulse duration, guarantees a successful write; however, it introduces significant energy overhead due to excessive margins since the average write pulse duration is far shorter than the worst-case pulse duration. This dissertation develops novel circuit techniques to exploit the stochastic properties of STT-RAM write operation for energy savings by moving away from the worst-case approach to dynamic strategies while maintaining the required low error rate. The first contribution is a variable energy write (VEW) architecture that effectively exploits the wide distribution of write time to greatly reduce energy via a mechanism that checks the instantaneous state of the bitcell and deactivates the write current once the correct value has registered. The second contribution is a multiple attempt write (MAW) strategy that utilizes the asymptotic temporal stochastic independence of repeated switching events to achieve a dramatic reduction in energy. The proposed architectures are evaluated using a compact STT-RAM cell model. Analysis indicates that VEW succeeded in reducing the write energy by 94.7% with approximately 1% relative area overhead under an efficient design methodology compared with the conventional designs relying on the worst case approach. MAW reduced the overall write energy by 94.6% with approximately 0.05% relative area overhead. This dissertation then addresses the problem of probabilistic modeling of faults due to on-chip voltage variations. The power supply voltage variation can increase gate delay, resulting in timing faults on near-critical paths. These low-level faults ultimately propagate to architecture and application levels, often leading to critical system failures. Developing an accurate fault model and injection tool that generates and propagates faults from circuit- to gate-level is important for accurately predicting the resulting system failures. This is challenging since the model needs to accurately capture the physical characteristics at the circuit level that define the likelihood of a fault and use that information to guide the injection with the proper probability. At the same time, the analysis and fault injections need to be computationally manageable to allow analysis of realistic systems under realistic workloads. The conventional fault models rely on either Monte Carlo sampling or time-consuming runtime simulation using the worst-case voltage drop. To overcome simulation overheads of runtime circuit-level simulation, a novel two-phase approach is proposed. The main idea is that circuit characterization can be done before simulation. The result of pre-characterization is used at runtime via a form of look-up to enable gate-level efficiency. The two-phase methodology is time-efficient but may require high memory unless the look-up tables are carefully optimized. This dissertation also develops the fault probability estimation based on workload-specific voltage distribution, rather than a fixed worst-case voltage. The proposed methodology is implemented on an OpenSPARC design targeting on a 32nm technology node. Analysis indicates the proposed fault modeling and injection flow reduces runtime overhead by 24X compared to the previously best-known gate-level fault simulator while having circuit level accuracy.